date:20140715

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4191/
Java: 32bit/jdk1.8.0_20-ea-b21 -client -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 51837 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:467: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:406: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:87: 
The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:181:
 Source checkout is dirty after running tests!!! Offending files:
* ./solr/licenses/log4j-1.2.16.jar.sha1

Total time: 174 minutes 31 seconds
Build step 'Invoke Ant' marked build as failure
[description-setter] Description set: Java: 32bit/jdk1.8.0_20-ea-b21 -client 
-XX:+UseParallelGC
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_60) - Build # 10827 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10827/
Java: 64bit/jdk1.7.0_60 -XX:-UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 59454 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:406: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:87: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:181: 
Source checkout is dirty after running tests!!! Offending files:
* ./solr/licenses/log4j-1.2.16.jar.sha1

Total time: 98 minutes 46 seconds
Build step 'Invoke Ant' marked build as failure
[description-setter] Description set: Java: 64bit/jdk1.7.0_60 
-XX:-UseCompressedOops -XX:+UseParallelGC
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60) - Build # 10705 - Failure!

2014-07-15 Thread Shalin Shekhar Mangar

Tim has already fixed these.


On Wed, Jul 16, 2014 at 6:49 AM, Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10705/
> Java: 32bit/jdk1.7.0_60 -server -XX:+UseSerialGC
>
> All tests passed
>
> Build Log:
> [...truncated 32600 lines...]
> -check-forbidden-all:
> [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
> [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
> [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.3
> [forbidden-apis] Reading API signatures:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/base.txt
> [forbidden-apis] Reading API signatures:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/servlet-api.txt
> [forbidden-apis] Loading classes to check...
> [forbidden-apis] Scanning for API signatures and dependencies...
> [forbidden-apis] Forbidden method invocation:
> java.util.concurrent.Executors#newFixedThreadPool(int) [Spawns threads with
> vague names; use a custom thread factory (Lucene's NamedThreadFactory,
> Solr's DefaultSolrThreadFactory) and name threads so that you can tell (by
> its name) which executor it is associated with]
> [forbidden-apis]   in
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest
> (ConcurrentUpdateSolrServerTest.java:167)
> [forbidden-apis] Forbidden method invocation:
> javax.servlet.ServletRequest#getParameterMap() [Servlet API method is
> parsing request parameters without using the correct encoding if no extra
> configuration is given in the servlet container]
> [forbidden-apis]   in
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$TestServlet
> (ConcurrentUpdateSolrServerTest.java:85)
> [forbidden-apis] Scanned 368 (and 504 related) class file(s) for forbidden
> API invocations (in 0.15s), 2 error(s).
>
> BUILD FAILED
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:70: The
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build.xml:271: The
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/common-build.xml:479:
> Check for forbidden API calls failed, see log.
>
> Total time: 105 minutes 44 seconds
> Build step 'Invoke Ant' marked build as failure
> [description-setter] Description set: Java: 32bit/jdk1.7.0_60 -server
> -XX:+UseSerialGC
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>



-- 
Regards,
Shalin Shekhar Mangar.

[jira] [Created] (LUCENE-5826) Support proper hunspell case handling and related options

Robert Muir created LUCENE-5826:
---

 Summary: Support proper hunspell case handling and related options
 Key: LUCENE-5826
 URL: https://issues.apache.org/jira/browse/LUCENE-5826
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5826.patch

When ignoreCase=false, we should accept title-cased/upper-cased forms just like 
hunspell -m. Furthermore there are some options around this:
* LANG: can turn on alternate casing for turkish/azeri
* KEEPCASE: can prevent acceptance of title/upper cased forms for words

While we are here setting up the same logic anyway, add support for similar  
options:
* NEEDAFFIX/PSEUDOROOT: form is invalid without being affixed
* ONLYINCOMPOUND: form/affixes only make sense inside compounds.

This stuff is unrelated to the ignoreCase=true option. If you use that option 
though, it does use correct alternate casing for tr_TR/az_AZ now though.

I didn't yet implement CHECKSHARPS because it seems more complicated, I have to 
figure out what the logic there should be first.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5826) Support proper hunspell case handling and related options


 [ 
https://issues.apache.org/jira/browse/LUCENE-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5826:


Attachment: LUCENE-5826.patch

Patch with tests for these options and casing behavior.

> Support proper hunspell case handling and related options
> -
>
> Key: LUCENE-5826
> URL: https://issues.apache.org/jira/browse/LUCENE-5826
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Attachments: LUCENE-5826.patch
>
>
> When ignoreCase=false, we should accept title-cased/upper-cased forms just 
> like hunspell -m. Furthermore there are some options around this:
> * LANG: can turn on alternate casing for turkish/azeri
> * KEEPCASE: can prevent acceptance of title/upper cased forms for words
> While we are here setting up the same logic anyway, add support for similar  
> options:
> * NEEDAFFIX/PSEUDOROOT: form is invalid without being affixed
> * ONLYINCOMPOUND: form/affixes only make sense inside compounds.
> This stuff is unrelated to the ignoreCase=true option. If you use that option 
> though, it does use correct alternate casing for tr_TR/az_AZ now though.
> I didn't yet implement CHECKSHARPS because it seems more complicated, I have 
> to figure out what the logic there should be first.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5825) Allowing the benchmarking algorithm to choose PostingsFormat

2014-07-15 Thread Varun V Shenoy (JIRA)

Varun  V Shenoy created LUCENE-5825:
---

 Summary: Allowing the benchmarking algorithm to choose 
PostingsFormat
 Key: LUCENE-5825
 URL: https://issues.apache.org/jira/browse/LUCENE-5825
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/benchmark
Affects Versions: 5.0
Reporter: Varun  V Shenoy
Priority: Minor
 Fix For: 5.0


The algorithm file for benchmarking should allow PostingsFormat to be 
configurable.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-07-15 Thread Da Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063089#comment-14063089
 ] 

Da Huang commented on LUCENE-4396:
--

Thank you, Mike!
{quote}
It looks like this gave some nice gains with the many-not cases
{quote}
Yes, but many-not cases may not be a usual case. Therefore, this method might 
be used in the final method.

{quote}
Curiously some of the tasks are really hurt by the larger sizes ... maybe 1<<9 
is a good compromise?
{quote}
Yeah. Finally, I will just focus on those \*Some\* cases. 
"size9" is better for HighAndSomeHighOr case, while "size5" is better for 
LowAndSomeHighOr, LowAndSomeLowNot  and LowAndSomeLowOr cases.
I think it would be better to detect the case type and adjust the SIZE of 
bucketTable in BNS's constructor.

> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> SIZE.perf, luceneutil-score-equal.patch, luceneutil-score-equal.patch, 
> stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-07-15 Thread Da Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063089#comment-14063089
 ] 

Da Huang edited comment on LUCENE-4396 at 7/16/14 3:53 AM:
---

Thank you, Mike!
{quote}
It looks like this gave some nice gains with the many-not cases
{quote}
Yes, but many-not cases might not be a usual case. Therefore, this method might 
not be used in the final method.

{quote}
Curiously some of the tasks are really hurt by the larger sizes ... maybe 1<<9 
is a good compromise?
{quote}
Yeah. Finally, I will just focus on those \*Some\* cases. 
"size9" is better for HighAndSomeHighOr case, while "size5" is better for 
LowAndSomeHighOr, LowAndSomeLowNot  and LowAndSomeLowOr cases.
I think it would be better to detect the case type and adjust the SIZE of 
bucketTable in BNS's constructor.


was (Author: dhuang):
Thank you, Mike!
{quote}
It looks like this gave some nice gains with the many-not cases
{quote}
Yes, but many-not cases may not be a usual case. Therefore, this method might 
be used in the final method.

{quote}
Curiously some of the tasks are really hurt by the larger sizes ... maybe 1<<9 
is a good compromise?
{quote}
Yeah. Finally, I will just focus on those \*Some\* cases. 
"size9" is better for HighAndSomeHighOr case, while "size5" is better for 
LowAndSomeHighOr, LowAndSomeLowNot  and LowAndSomeLowOr cases.
I think it would be better to detect the case type and adjust the SIZE of 
bucketTable in BNS's constructor.

> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> SIZE.perf, luceneutil-score-equal.patch, luceneutil-score-equal.patch, 
> stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/ibm-j9-jdk7) - Build # 10706 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10706/
Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

All tests passed

Build Log:
[...truncated 32538 lines...]
-check-forbidden-all:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.3
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] Forbidden method invocation: 
java.util.concurrent.Executors#newFixedThreadPool(int) [Spawns threads with 
vague names; use a custom thread factory (Lucene's NamedThreadFactory, Solr's 
DefaultSolrThreadFactory) and name threads so that you can tell (by its name) 
which executor it is associated with]
[forbidden-apis]   in 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest 
(ConcurrentUpdateSolrServerTest.java:167)
[forbidden-apis] Forbidden method invocation: 
javax.servlet.ServletRequest#getParameterMap() [Servlet API method is parsing 
request parameters without using the correct encoding if no extra configuration 
is given in the servlet container]
[forbidden-apis]   in 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$TestServlet 
(ConcurrentUpdateSolrServerTest.java:85)
[forbidden-apis] Scanned 368 (and 503 related) class file(s) for forbidden API 
invocations (in 0.26s), 2 error(s).

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:70: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build.xml:271: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/common-build.xml:479: 
Check for forbidden API calls failed, see log.

Total time: 66 minutes 37 seconds
Build step 'Invoke Ant' marked build as failure
[description-setter] Description set: Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5746) solr.xml parsing of "str" vs "int" vs "bool" is brittle; fails silently; expects odd type for "shareSchema"


 [ 
https://issues.apache.org/jira/browse/SOLR-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5746:
---

Attachment: SOLR-5746.patch


bq. Since I stored "raw" config values, I used SolrParam to do the type 
conversion, but I didn’t find any API for a parameter removal. That’s why I’m 
keeping the original NamedList, so that I can remove correctly read values and 
keep track of the unknown ones.

my previous suggestion about using SolrParams was vague and misguided.  i 
_think_ i had in mind the idea of using SolrParams instead of the 
{{Map propMap}} -- your latest patch that eliminates the 
SolrParams and just uses the NamedList is definitely a better call.


bq. ... However, I didn't realise that solr.xml files are versioned the same 
way as schema.xml files are. Should I bump the schema version to 1.6?

It's not explicitly versioned -- just hueristicly versioned based on wether 
ConfigSolr detects by introspection that it's the "old style" or the "new 
style" ... i think Jack may have just been confused about what this issue was 
when he posted his comment.



I'm attaching some updates to your patch...

* skimming your changes made me realize there's a lot of cruft in this code 
related to defered sys prop substitution that's no longer needed at all - so I 
ripped that out.
* I'm not really a fan of the way you added "excludedElements" to the DOMUtils 
method -- particularly since it still required the 
{{namedList.removeAll(null);}} call which seemed sloppy.  I'd much rather have 
a tighter XPath that is very explicit about what we want out of the dom and 
handle the excusions that way ... so i changed that.
* i added some explicit testing of {{}} since i wasn't 
completley convinced your new code that that would work correctly.
* I think i misslead you a bit when i said we should validate configs being 
declared multiple times -- it's not a good idea to up front check that nothing 
is declared more then once, beause a week from now someone may in fact want to 
add something to solr.xml that can be specified multiple times.  The better 
place for this type of validation is in storeConfigProperty, because at that 
point we know we expect there to be a single value.
** this does unfortunately mean it aborts early the first time it finds a 
duplicated key, so some of your tests had to be changed.
* I switched the check for unknown options to be per section so the error msgs 
could include the section details as well.
* String.format must be used with Locale.ROOT to prevent locale sensitive 
behavior. 
** {{ant check-forbidden-apis}} will point out stuff like this for you in the 
lucene/solr code base
* relaxed the int parsing so that small {{}} values are fine, but large 
longs still throw an error
** added test for both cases
* added some checks that the sections themselves weren't being duplicated (ie: 
if a user adds a  section totheir solr.xml, we want to give them an 
error if another  section already existed higher up in the file)
* some general test refactoring...
** no need to construct new Random instances -- just use random()
** eliminated a lot of unneccessary file creation in the tests by using 
{{ConfigSolr.fromString(loader, solrXml);}} instead of 
{{FileUtils.writeFile(...)}} and {{ConfigSolr.fromSolrHome(...)}}
** switched to the lucene convention of testmethod naming to eliminate ~20 
lines of {{@Test}} annotations (the verbosity is why our test runner explicitly 
lets us continue to use the JUnit3 convention)



I think this is probably ready to go -- but it would be nice to get some review 
from [~romseygeek] and/or [~erickerickson] since they know this code the best 
... and of course, [~maciej.zasada]: you've clearly been looking at this code a 
lot the last few days, do you ou have any additional thoughts on my revised 
patch?



> solr.xml parsing of "str" vs "int" vs "bool" is brittle; fails silently; 
> expects odd type for "shareSchema"   
> --
>
> Key: SOLR-5746
> URL: https://issues.apache.org/jira/browse/SOLR-5746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5, 4.6
>Reporter: Hoss Man
> Attachments: SOLR-5746.patch, SOLR-5746.patch, SOLR-5746.patch, 
> SOLR-5746.patch, SOLR-5746.patch
>
>
> A comment in the ref guide got me looking at ConfigSolrXml.java and noticing 
> that the parsing of solr.xml options here is very brittle and confusing.  In 
> particular:
> * if a boolean option "foo" is expected along the lines of {{ name="foo">true}} it will silently ignore {{ name="foo">true}}
> * likewise for an int option {{32}} vs {{ name="bar">32}}
> ... this is inconsistent with the way solrconfig.xml is parsed.  In 
> solrconfig.xml, the xml nodes are parsed into a NamedList, and the abov

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60) - Build # 10705 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10705/
Java: 32bit/jdk1.7.0_60 -server -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 32600 lines...]
-check-forbidden-all:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.3
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] Forbidden method invocation: 
java.util.concurrent.Executors#newFixedThreadPool(int) [Spawns threads with 
vague names; use a custom thread factory (Lucene's NamedThreadFactory, Solr's 
DefaultSolrThreadFactory) and name threads so that you can tell (by its name) 
which executor it is associated with]
[forbidden-apis]   in 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest 
(ConcurrentUpdateSolrServerTest.java:167)
[forbidden-apis] Forbidden method invocation: 
javax.servlet.ServletRequest#getParameterMap() [Servlet API method is parsing 
request parameters without using the correct encoding if no extra configuration 
is given in the servlet container]
[forbidden-apis]   in 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$TestServlet 
(ConcurrentUpdateSolrServerTest.java:85)
[forbidden-apis] Scanned 368 (and 504 related) class file(s) for forbidden API 
invocations (in 0.15s), 2 error(s).

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:70: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build.xml:271: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/common-build.xml:479: 
Check for forbidden API calls failed, see log.

Total time: 105 minutes 44 seconds
Build step 'Invoke Ant' marked build as failure
[description-setter] Description set: Java: 32bit/jdk1.7.0_60 -server 
-XX:+UseSerialGC
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062909#comment-14062909
 ] 

Nathan Neulinger commented on SOLR-6251:


Additionally - since this works 99.9% of the time - I would surely think that a 
blatant problem as that would have been more visible. The incremental updates 
work normally without issue, and just randomly fail. 

> incorrect 'missing required field' during update - document definitely has it
> -
>
> Key: SOLR-6251
> URL: https://issues.apache.org/jira/browse/SOLR-6251
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.8
> Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All 
> on EC2. The two hosts are round-robin'd behind an ELB.
>Reporter: Nathan Neulinger
>  Labels: replication
> Attachments: schema.xml
>
>
> Document added on solr1. We can see the distribute take place from solr1 to 
> solr2 and returning a success. Subsequent searches returning document, 
> clearly showing the field as being there. Later on, an update is done to add 
> to an element of the document - and the update fails. The update was sent to 
> solr2 instance. 
> Schema marks the 'timestamp' field as required, so the initial insert should 
> not work if the field isn't present.
> Symptom is intermittent - we're seeing this randomly, with no warning or 
> triggering that we can see, but in all cases, it's getting the error in 
> response to an update when the instance tries to distribute the change to the 
> other node. 
> Searches that were run AFTER the update also show the field as being present 
> in the document. 
> Will add full trace of operations in the comments shortly. pcap captures of 
> ALL traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062907#comment-14062907
 ] 

Nathan Neulinger commented on SOLR-6251:


Leaving closed, but adding more information in case Hoss Man will comment 
additionally.

'timestamp' is: stored=true indexed=false

That seems to meet all of the requirements stated for partial updates unless 
'indexed=true' is also required and not documented. 


> incorrect 'missing required field' during update - document definitely has it
> -
>
> Key: SOLR-6251
> URL: https://issues.apache.org/jira/browse/SOLR-6251
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.8
> Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All 
> on EC2. The two hosts are round-robin'd behind an ELB.
>Reporter: Nathan Neulinger
>  Labels: replication
> Attachments: schema.xml
>
>
> Document added on solr1. We can see the distribute take place from solr1 to 
> solr2 and returning a success. Subsequent searches returning document, 
> clearly showing the field as being there. Later on, an update is done to add 
> to an element of the document - and the update fails. The update was sent to 
> solr2 instance. 
> Schema marks the 'timestamp' field as required, so the initial insert should 
> not work if the field isn't present.
> Symptom is intermittent - we're seeing this randomly, with no warning or 
> triggering that we can see, but in all cases, it's getting the error in 
> response to an update when the instance tries to distribute the change to the 
> other node. 
> Searches that were run AFTER the update also show the field as being present 
> in the document. 
> Will add full trace of operations in the comments shortly. pcap captures of 
> ALL traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


 [ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Neulinger updated SOLR-6251:
---

Attachment: schema.xml

schema attached

> incorrect 'missing required field' during update - document definitely has it
> -
>
> Key: SOLR-6251
> URL: https://issues.apache.org/jira/browse/SOLR-6251
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.8
> Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All 
> on EC2. The two hosts are round-robin'd behind an ELB.
>Reporter: Nathan Neulinger
>  Labels: replication
> Attachments: schema.xml
>
>
> Document added on solr1. We can see the distribute take place from solr1 to 
> solr2 and returning a success. Subsequent searches returning document, 
> clearly showing the field as being there. Later on, an update is done to add 
> to an element of the document - and the update fails. The update was sent to 
> solr2 instance. 
> Schema marks the 'timestamp' field as required, so the initial insert should 
> not work if the field isn't present.
> Symptom is intermittent - we're seeing this randomly, with no warning or 
> triggering that we can see, but in all cases, it's getting the error in 
> response to an update when the instance tries to distribute the change to the 
> other node. 
> Searches that were run AFTER the update also show the field as being present 
> in the document. 
> Will add full trace of operations in the comments shortly. pcap captures of 
> ALL traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062874#comment-14062874
 ] 

Hoss Man commented on SOLR-2894:


Hey Andrew, I probably won't have a chance to review this issue/patches again 
until monday - but some quick replies...

bq. With the OVERREQUEST options uncommented we do not get the proper bbc value 
and so the distributed version diverges from the non-distrib. Your second 
comment on this issue is exactly on point.

Just to clarify: you are saying that bbc isn't included in the "top" set in the 
distrib call because overrequest is so low, which is inconcsistent with the 
control where bbc is in the top -- but all of the values returned by the 
distrib call do in fact have accurate refined counts ... correct?

The point of that check is to definitely ensure that refinement works properly 
on facet.missing -- that's why i added it, because it wasn't before and the 
test didn't catch it because of the default overrequest -- so we can't 
eliminate those OVERREQUEST params.

what we can do is explicitly call {{queryServer(...)}} instead of 
{{query(...)}} to ht a random distributed server bu bypass the comparison with 
the control server -- in that case though we want a lot of tight assertions to 
ensure that we aren't missing anything.

(of course: we can also include another check of the same facet.missing request 
with the overrequest disabled if you want -- no one ever complained about too 
many assertions in a test)

bq. Should facet.missing respect the mincount (in this case it's 1)?

I think so? .. if that's what the non-distrib code is doing, that's what the 
distrib code should do as well.


> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Hoss Man
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-mincount-minification.patch, 
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch, 
> dateToObject.patch, pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062806#comment-14062806
 ] 

Nathan Neulinger commented on SOLR-6251:


We are open to diagnostic suggestions on this, but are at a loss since this 
appears to be very intermittent and non-reproducible other than by waiting.

Looking at solrconfig.xml compared to what is currently in 4.8.0 example - 
there are a variety of differences, mostly look like due to this config 
originally being based on 4.4 solrconfig.xml example. 

> incorrect 'missing required field' during update - document definitely has it
> -
>
> Key: SOLR-6251
> URL: https://issues.apache.org/jira/browse/SOLR-6251
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.8
> Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All 
> on EC2. The two hosts are round-robin'd behind an ELB.
>Reporter: Nathan Neulinger
>  Labels: replication
>
> Document added on solr1. We can see the distribute take place from solr1 to 
> solr2 and returning a success. Subsequent searches returning document, 
> clearly showing the field as being there. Later on, an update is done to add 
> to an element of the document - and the update fails. The update was sent to 
> solr2 instance. 
> Schema marks the 'timestamp' field as required, so the initial insert should 
> not work if the field isn't present.
> Symptom is intermittent - we're seeing this randomly, with no warning or 
> triggering that we can see, but in all cases, it's getting the error in 
> response to an update when the instance tries to distribute the change to the 
> other node. 
> Searches that were run AFTER the update also show the field as being present 
> in the document. 
> Will add full trace of operations in the comments shortly. pcap captures of 
> ALL traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


 [ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-6251.


Resolution: Not a Problem

resolving as not a problem - refered user to solr-user@lucene mailing list.

can reopen if more details indicate an actual bug.

> incorrect 'missing required field' during update - document definitely has it
> -
>
> Key: SOLR-6251
> URL: https://issues.apache.org/jira/browse/SOLR-6251
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.8
> Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All 
> on EC2. The two hosts are round-robin'd behind an ELB.
>Reporter: Nathan Neulinger
>  Labels: replication
>
> Document added on solr1. We can see the distribute take place from solr1 to 
> solr2 and returning a success. Subsequent searches returning document, 
> clearly showing the field as being there. Later on, an update is done to add 
> to an element of the document - and the update fails. The update was sent to 
> solr2 instance. 
> Schema marks the 'timestamp' field as required, so the initial insert should 
> not work if the field isn't present.
> Symptom is intermittent - we're seeing this randomly, with no warning or 
> triggering that we can see, but in all cases, it's getting the error in 
> response to an update when the instance tries to distribute the change to the 
> other node. 
> Searches that were run AFTER the update also show the field as being present 
> in the document. 
> Will add full trace of operations in the comments shortly. pcap captures of 
> ALL traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062802#comment-14062802
 ] 

Hoss Man commented on SOLR-6251:


Please ask questions about things like this on the solr-user@lucene list prior 
to filing a bug.

You have not provided details about your schema, but based on the details you 
have provided, it appears that your timestamp field is not "stored", therefore 
you are probably hitting a documented limitation of using partial updates...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
{panel}
All original source fields must be stored for field modifiers to work 
correctly, which is the Solr default.
{panel}

> incorrect 'missing required field' during update - document definitely has it
> -
>
> Key: SOLR-6251
> URL: https://issues.apache.org/jira/browse/SOLR-6251
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.8
> Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All 
> on EC2. The two hosts are round-robin'd behind an ELB.
>Reporter: Nathan Neulinger
>  Labels: replication
>
> Document added on solr1. We can see the distribute take place from solr1 to 
> solr2 and returning a success. Subsequent searches returning document, 
> clearly showing the field as being there. Later on, an update is done to add 
> to an element of the document - and the update fails. The update was sent to 
> solr2 instance. 
> Schema marks the 'timestamp' field as required, so the initial insert should 
> not work if the field isn't present.
> Symptom is intermittent - we're seeing this randomly, with no warning or 
> triggering that we can see, but in all cases, it's getting the error in 
> response to an update when the instance tries to distribute the change to the 
> other node. 
> Searches that were run AFTER the update also show the field as being present 
> in the document. 
> Will add full trace of operations in the comments shortly. pcap captures of 
> ALL traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062800#comment-14062800
 ] 

Nathan Neulinger edited comment on SOLR-6251 at 7/15/14 10:51 PM:
--

and here's an update in that same debug log from shortly before the error (the 
distribute from the insert of the document on solr1):

{noformat}
2014-07-10 21:29:49,313 INFO  qtp1599863753-30844 
[solr.update.processor.LogUpdateProcessor]  - [d-_v22_shard1_replica2] 
webapp=/solr path=/update 
params={distrib.from=http://10.220.16.204:8983/solr/d-_v22_shard1_replica1/&update.distrib=TOLEADER&wt=javabin&version=2}
 {add=[4b2c4d09-31e2-4fe2-b767-3868efbdcda1 (1473278419196182528)]} 0 11
2014-07-10 21:29:49,416 INFO  qtp1599863753-30844 
[org.apache.solr.update.UpdateHandler]  - start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
{noformat}


was (Author: nneul):
and here's an update in that same debug log from shortly before the error (the 
distribute from the insert of the document on solr1):

2014-07-10 21:29:49,313 INFO  qtp1599863753-30844 
[solr.update.processor.LogUpdateProcessor]  - [d-_v22_shard1_replica2] 
webapp=/solr path=/update 
params={distrib.from=http://10.220.16.204:8983/solr/d-_v22_shard1_replica1/&update.distrib=TOLEADER&wt=javabin&version=2}
 {add=[4b2c4d09-31e2-4fe2-b767-3868efbdcda1 (1473278419196182528)]} 0 11
2014-07-10 21:29:49,416 INFO  qtp1599863753-30844 
[org.apache.solr.update.UpdateHandler]  - start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}


> incorrect 'missing required field' during update - document definitely has it
> -
>
> Key: SOLR-6251
> URL: https://issues.apache.org/jira/browse/SOLR-6251
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.8
> Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All 
> on EC2. The two hosts are round-robin'd behind an ELB.
>Reporter: Nathan Neulinger
>  Labels: replication
>
> Document added on solr1. We can see the distribute take place from solr1 to 
> solr2 and returning a success. Subsequent searches returning document, 
> clearly showing the field as being there. Later on, an update is done to add 
> to an element of the document - and the update fails. The update was sent to 
> solr2 instance. 
> Schema marks the 'timestamp' field as required, so the initial insert should 
> not work if the field isn't present.
> Symptom is intermittent - we're seeing this randomly, with no warning or 
> triggering that we can see, but in all cases, it's getting the error in 
> response to an update when the instance tries to distribute the change to the 
> other node. 
> Searches that were run AFTER the update also show the field as being present 
> in the document. 
> Will add full trace of operations in the comments shortly. pcap captures of 
> ALL traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062793#comment-14062793
 ] 

Nathan Neulinger edited comment on SOLR-6251 at 7/15/14 10:52 PM:
--

{noformat}
16.24 = POD SRV
16.204 = SOLR 1
16.207 = SOLR 2

16.24 ⇒ 16.204 
CAP 1
11344 14:29:49.299883

POST /solr/d-_v22/update/json?commit=true HTTP/1.1
host: d01-solr.srv.hivepoint.com
Accept-Encoding: gzip,deflate
Content-Type: application/json; charset=UTF-8
request_id: null 8677c2fb-8b92-4220-bb73-1e4c610d95be 2057
User-Agent: HivePoint (Factory JSON client:null:2056)
X-Forwarded-For: 10.220.16.229
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Content-Length: 1555
Connection: keep-alive

{ "add": { "commitWithin" : 5000, "doc" : 
{"hive":"vdates","at":"2014-07-10T21:28:41Z","timestamp":1405027721000,"type":"MESSAGE","channel":["dev"],"from":"pr...@sevogle.com","to":["a...@sevogle.com","vi...@sevogle.com","d...@sevogle.com","s...@hive.sevogle.com"],"subject":"Re:
 Deployments - B and then C","body":"eve.SNIP...stem. 
","id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1","message_id":"2014-07-10-77a6614c-66e4-4ddb-8566-dff4bfb743d1"}
 } }


16.204 ⇒ 16.207
CAP 1


POST 
/solr/d-_v22_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.220.16.204%3A8983%2Fsolr%2Fd-_v22_shard1_replica1%2F&wt=javabin&version=2
 HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0
Content-Type: application/javabin
Transfer-Encoding: chunked
Host: 10.220.16.207:8983
Connection: Keep-Alive

64c
...¶ms...update.distrib(TOLEADER.,distrib.from?.http://10.220.16.204:8983/solr/d-_v22_shard1_replica1/.&delByQ..'docsMap.?$hive&vdates."at42014-07-10T21:28:41Z.)timestampx...$type'MESSAGE.'channel.#dev.$from1pr...@sevogle.com."to.0adam@sevogle.com1vi...@sevogle.com/dev@sevogle.com4...@hive.sevogle.com.'subject>Re:
 Deployments - B and then C.$body?#eve.SNIP...tem. 
."id?.4b2c4d09-31e2-4fe2-b767-3868efbdcda1.*message_id?.2014-07-10-77a6614c-66e4-4ddb-8566-dff4bfb743d1
.."ow.."cwX...
0



16.207 ⇒ 16.204 
CAP 1
11368 14:29:49.495301
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 40

responseHeader..&status..%QTimeK



16.24 ⇒ 16.204 
CAP 1
11371 14:29:49.496308

INDEX COMPLETE

HTTP/1.1 200 OK
Content-Type: text/plain;charset=UTF-8
Transfer-Encoding: chunked

2C
{"responseHeader":{"status":0,"QTime":195}}

0


16.24 ⇒ 16.207 
CAP 2
9218 14:29:57.065156
9232 14:29”57.099274

Search (two different search results to two servers?) that show the timestamp 
is set.

POST /solr/d-_v22/select?indent=on&wt=json HTTP/1.1
host: d01-solr.srv.hivepoint.com
Accept-Encoding: gzip,deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
request_id: null 957d1ca5-7200-4058-9c70-16a17fc64c19 2069
User-Agent: HivePoint (Factory JSON client:null:2068)
X-Forwarded-For: 10.220.16.229
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Content-Length: 244
Connection: keep-alive

q=%2B%28*%29&fq=%2Bhive%3Avdates+AND+%2Bchannel%3A%28adam+bethany+dev+notifications+preet+share%29+AND+at%3A%5B2014-07-10T21%3A27%3A56Z+TO+*%5D&start=0&rows=300&sort=at+desc%2C+id+desc&fl=id,hive,timestamp,type,message_id,file_instance_id,scoreHTTP/1.1
 200 OK
Content-Type: text/plain;charset=UTF-8
Transfer-Encoding: chunked

2BB
{
  "responseHeader":{
"status":0,
"QTime":3,
"params":{
  "fl":"id,hive,timestamp,type,message_id,file_instance_id,score",
  "sort":"at desc, id desc",
  "indent":"on",
  "start":"0",
  "q":"+(*)",
  "wt":"json",
  "fq":"+hive:vdates AND +channel:(adam bethany dev notifications preet 
share) AND at:[2014-07-10T21:27:56Z TO *]",
  "rows":"300"}},
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
  {
"hive":"vdates",
"timestamp":1405027721000,
"type":"MESSAGE",
"id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1",
"message_id":"2014-07-10-77a6614c-66e4-4ddb-8566-dff4bfb743d1",
"score":1.0}]
  }}

0




16.24 ⇒ 16.207 
CAP 2
9415 14:30:00.310995

Update Channel

POST /solr/d-_v22/update?commit=true HTTP/1.1
host: d01-solr.srv.hivepoint.com
Accept-Encoding: gzip,deflate
Content-Type: application/json; charset=UTF-8
request_id: null 92fa6c11-78d8-44cc-a143-9ff3e4c132f4 2115
User-Agent: HivePoint (Factory JSON client:null:2114)
X-Forwarded-For: 10.220.16.229
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Content-Length: 102
Connection: keep-alive

[{"id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1","channel": {"add": 
"preet"},"channel": {"add": "adam"}}]HTTP/1.1 400 Bad Request
Content-Type: text/plain;charset=UTF-8
Transfer-Encoding: chunked

96
{"responseHeader":{"status":400,"QTime":1},"error":{"msg":"[doc=4b2c4d09-31e2-4fe2-b767-3868efbdcda1]
 missing required field: timestamp","code":400}}

0


CAP 2
9602 14:30:08.082758

Subsequent search, after update

POST /

[jira] [Commented] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062800#comment-14062800
 ] 

Nathan Neulinger commented on SOLR-6251:


and here's an update in that same debug log from shortly before the error (the 
distribute from the insert of the document on solr1):

2014-07-10 21:29:49,313 INFO  qtp1599863753-30844 
[solr.update.processor.LogUpdateProcessor]  - [d-_v22_shard1_replica2] 
webapp=/solr path=/update 
params={distrib.from=http://10.220.16.204:8983/solr/d-_v22_shard1_replica1/&update.distrib=TOLEADER&wt=javabin&version=2}
 {add=[4b2c4d09-31e2-4fe2-b767-3868efbdcda1 (1473278419196182528)]} 0 11
2014-07-10 21:29:49,416 INFO  qtp1599863753-30844 
[org.apache.solr.update.UpdateHandler]  - start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}


> incorrect 'missing required field' during update - document definitely has it
> -
>
> Key: SOLR-6251
> URL: https://issues.apache.org/jira/browse/SOLR-6251
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.8
> Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All 
> on EC2. The two hosts are round-robin'd behind an ELB.
>Reporter: Nathan Neulinger
>  Labels: replication
>
> Document added on solr1. We can see the distribute take place from solr1 to 
> solr2 and returning a success. Subsequent searches returning document, 
> clearly showing the field as being there. Later on, an update is done to add 
> to an element of the document - and the update fails. The update was sent to 
> solr2 instance. 
> Schema marks the 'timestamp' field as required, so the initial insert should 
> not work if the field isn't present.
> Symptom is intermittent - we're seeing this randomly, with no warning or 
> triggering that we can see, but in all cases, it's getting the error in 
> response to an update when the instance tries to distribute the change to the 
> other node. 
> Searches that were run AFTER the update also show the field as being present 
> in the document. 
> Will add full trace of operations in the comments shortly. pcap captures of 
> ALL traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062796#comment-14062796
 ] 

Nathan Neulinger commented on SOLR-6251:



this is the occurrence of the error on the server the update ran on

2014-07-10 21:30:00,313 ERROR qtp1599863753-30801 
[org.apache.solr.core.SolrCore   ]  - 
org.apache.solr.common.SolrException: 
[doc=4b2c4d09-31e2-4fe2-b767-3868efbdcda1] missing required field: timestamp
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:189)
at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:77)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:234)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:704)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:858)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:393)
at 
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:118)
at 
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:102)
at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:66)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)



> incorrect 'missi

[jira] [Commented] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it


[ 
https://issues.apache.org/jira/browse/SOLR-6251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062793#comment-14062793
 ] 

Nathan Neulinger commented on SOLR-6251:


16.24 = POD SRV
16.204 = SOLR 1
16.207 = SOLR 2

16.24 ⇒ 16.204 
CAP 1
11344 14:29:49.299883

POST /solr/d-_v22/update/json?commit=true HTTP/1.1
host: d01-solr.srv.hivepoint.com
Accept-Encoding: gzip,deflate
Content-Type: application/json; charset=UTF-8
request_id: null 8677c2fb-8b92-4220-bb73-1e4c610d95be 2057
User-Agent: HivePoint (Factory JSON client:null:2056)
X-Forwarded-For: 10.220.16.229
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Content-Length: 1555
Connection: keep-alive

{ "add": { "commitWithin" : 5000, "doc" : 
{"hive":"vdates","at":"2014-07-10T21:28:41Z","timestamp":1405027721000,"type":"MESSAGE","channel":["dev"],"from":"pr...@sevogle.com","to":["a...@sevogle.com","vi...@sevogle.com","d...@sevogle.com","s...@hive.sevogle.com"],"subject":"Re:
 Deployments - B and then C","body":"eve.SNIP...stem. 
","id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1","message_id":"2014-07-10-77a6614c-66e4-4ddb-8566-dff4bfb743d1"}
 } }


16.204 ⇒ 16.207
CAP 1


POST 
/solr/d-_v22_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.220.16.204%3A8983%2Fsolr%2Fd-_v22_shard1_replica1%2F&wt=javabin&version=2
 HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0
Content-Type: application/javabin
Transfer-Encoding: chunked
Host: 10.220.16.207:8983
Connection: Keep-Alive

64c
...¶ms...update.distrib(TOLEADER.,distrib.from?.http://10.220.16.204:8983/solr/d-_v22_shard1_replica1/.&delByQ..'docsMap.?$hive&vdates."at42014-07-10T21:28:41Z.)timestampx...$type'MESSAGE.'channel.#dev.$from1pr...@sevogle.com."to.0adam@sevogle.com1vi...@sevogle.com/dev@sevogle.com4...@hive.sevogle.com.'subject>Re:
 Deployments - B and then C.$body?#eve.SNIP...tem. 
."id?.4b2c4d09-31e2-4fe2-b767-3868efbdcda1.*message_id?.2014-07-10-77a6614c-66e4-4ddb-8566-dff4bfb743d1
.."ow.."cwX...
0



16.207 ⇒ 16.204 
CAP 1
11368 14:29:49.495301
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 40

responseHeader..&status..%QTimeK



16.24 ⇒ 16.204 
CAP 1
11371 14:29:49.496308

INDEX COMPLETE

HTTP/1.1 200 OK
Content-Type: text/plain;charset=UTF-8
Transfer-Encoding: chunked

2C
{"responseHeader":{"status":0,"QTime":195}}

0


16.24 ⇒ 16.207 
CAP 2
9218 14:29:57.065156
9232 14:29”57.099274

Search (two different search results to two servers?) that show the timestamp 
is set.

POST /solr/d-_v22/select?indent=on&wt=json HTTP/1.1
host: d01-solr.srv.hivepoint.com
Accept-Encoding: gzip,deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
request_id: null 957d1ca5-7200-4058-9c70-16a17fc64c19 2069
User-Agent: HivePoint (Factory JSON client:null:2068)
X-Forwarded-For: 10.220.16.229
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Content-Length: 244
Connection: keep-alive

q=%2B%28*%29&fq=%2Bhive%3Avdates+AND+%2Bchannel%3A%28adam+bethany+dev+notifications+preet+share%29+AND+at%3A%5B2014-07-10T21%3A27%3A56Z+TO+*%5D&start=0&rows=300&sort=at+desc%2C+id+desc&fl=id,hive,timestamp,type,message_id,file_instance_id,scoreHTTP/1.1
 200 OK
Content-Type: text/plain;charset=UTF-8
Transfer-Encoding: chunked

2BB
{
  "responseHeader":{
"status":0,
"QTime":3,
"params":{
  "fl":"id,hive,timestamp,type,message_id,file_instance_id,score",
  "sort":"at desc, id desc",
  "indent":"on",
  "start":"0",
  "q":"+(*)",
  "wt":"json",
  "fq":"+hive:vdates AND +channel:(adam bethany dev notifications preet 
share) AND at:[2014-07-10T21:27:56Z TO *]",
  "rows":"300"}},
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
  {
"hive":"vdates",
"timestamp":1405027721000,
"type":"MESSAGE",
"id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1",
"message_id":"2014-07-10-77a6614c-66e4-4ddb-8566-dff4bfb743d1",
"score":1.0}]
  }}

0




16.24 ⇒ 16.207 
CAP 2
9415 14:30:00.310995

Update Channel

POST /solr/d-_v22/update?commit=true HTTP/1.1
host: d01-solr.srv.hivepoint.com
Accept-Encoding: gzip,deflate
Content-Type: application/json; charset=UTF-8
request_id: null 92fa6c11-78d8-44cc-a143-9ff3e4c132f4 2115
User-Agent: HivePoint (Factory JSON client:null:2114)
X-Forwarded-For: 10.220.16.229
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Content-Length: 102
Connection: keep-alive

[{"id":"4b2c4d09-31e2-4fe2-b767-3868efbdcda1","channel": {"add": 
"preet"},"channel": {"add": "adam"}}]HTTP/1.1 400 Bad Request
Content-Type: text/plain;charset=UTF-8
Transfer-Encoding: chunked

96
{"responseHeader":{"status":400,"QTime":1},"error":{"msg":"[doc=4b2c4d09-31e2-4fe2-b767-3868efbdcda1]
 missing required field: timestamp","code":400}}

0


CAP 2
9602 14:30:08.082758

Subsequent search, after update

POST /solr/d-_v22/select?indent=on&wt=json HTTP/1.1
host: d01-solr.s

[jira] [Commented] (LUCENE-5681) Fix RAMDirectory's IndexInput to not double-buffer on slice()


[ 
https://issues.apache.org/jira/browse/LUCENE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062795#comment-14062795
 ] 

Robert Muir commented on LUCENE-5681:
-

Looks good, thanks Uwe!

> Fix RAMDirectory's IndexInput to not double-buffer on slice()
> -
>
> Key: LUCENE-5681
> URL: https://issues.apache.org/jira/browse/LUCENE-5681
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5681.patch, LUCENE-5681.patch, LUCENE-5681.patch
>
>
> After LUCENE-4371, we still have a non-optimal implementation of 
> IndexInput#slice() in RAMDirectory. We should fix that to use the cloning 
> approach like other directories do



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6251) incorrect 'missing required field' during update - document definitely has it

Nathan Neulinger created SOLR-6251:
--

 Summary: incorrect 'missing required field' during update - 
document definitely has it
 Key: SOLR-6251
 URL: https://issues.apache.org/jira/browse/SOLR-6251
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.8
 Environment: 4.8.0. Two nodes, SolrCloud, external ZK ensemble. All on 
EC2. The two hosts are round-robin'd behind an ELB.
Reporter: Nathan Neulinger



Document added on solr1. We can see the distribute take place from solr1 to 
solr2 and returning a success. Subsequent searches returning document, clearly 
showing the field as being there. Later on, an update is done to add to an 
element of the document - and the update fails. The update was sent to solr2 
instance. 

Schema marks the 'timestamp' field as required, so the initial insert should 
not work if the field isn't present.

Symptom is intermittent - we're seeing this randomly, with no warning or 
triggering that we can see, but in all cases, it's getting the error in 
response to an update when the instance tries to distribute the change to the 
other node. 

Searches that were run AFTER the update also show the field as being present in 
the document. 

Will add full trace of operations in the comments shortly. pcap captures of ALL 
traffic for the two nodes on 8983 is available if requested. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5681) Fix RAMDirectory's IndexInput to not double-buffer on slice()


 [ 
https://issues.apache.org/jira/browse/LUCENE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5681:
--

Fix Version/s: 4.10

> Fix RAMDirectory's IndexInput to not double-buffer on slice()
> -
>
> Key: LUCENE-5681
> URL: https://issues.apache.org/jira/browse/LUCENE-5681
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Reporter: Uwe Schindler
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5681.patch, LUCENE-5681.patch, LUCENE-5681.patch
>
>
> After LUCENE-4371, we still have a non-optimal implementation of 
> IndexInput#slice() in RAMDirectory. We should fix that to use the cloning 
> approach like other directories do



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-5681) Fix RAMDirectory's IndexInput to not double-buffer on slice()


 [ 
https://issues.apache.org/jira/browse/LUCENE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-5681:
-

Assignee: Uwe Schindler

> Fix RAMDirectory's IndexInput to not double-buffer on slice()
> -
>
> Key: LUCENE-5681
> URL: https://issues.apache.org/jira/browse/LUCENE-5681
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5681.patch, LUCENE-5681.patch, LUCENE-5681.patch
>
>
> After LUCENE-4371, we still have a non-optimal implementation of 
> IndexInput#slice() in RAMDirectory. We should fix that to use the cloning 
> approach like other directories do



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5681) Fix RAMDirectory's IndexInput to not double-buffer on slice()


 [ 
https://issues.apache.org/jira/browse/LUCENE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5681:
--

Attachment: LUCENE-5681.patch

Improve IllegalArgumentExceptions, be more strict on out-of-bounds slice.

I will commit this tomorrow and backport to 4.10.

> Fix RAMDirectory's IndexInput to not double-buffer on slice()
> -
>
> Key: LUCENE-5681
> URL: https://issues.apache.org/jira/browse/LUCENE-5681
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Reporter: Uwe Schindler
> Fix For: 5.0
>
> Attachments: LUCENE-5681.patch, LUCENE-5681.patch, LUCENE-5681.patch
>
>
> After LUCENE-4371, we still have a non-optimal implementation of 
> IndexInput#slice() in RAMDirectory. We should fix that to use the cloning 
> approach like other directories do



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6179) ManagedResource repeatedly logs warnings when not used


 [ 
https://issues.apache.org/jira/browse/SOLR-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter resolved SOLR-6179.
--

   Resolution: Fixed
Fix Version/s: 4.10
   5.0

> ManagedResource repeatedly logs warnings when not used
> --
>
> Key: SOLR-6179
> URL: https://issues.apache.org/jira/browse/SOLR-6179
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.8, 4.8.1
> Environment: 
>Reporter: Hoss Man
>Assignee: Timothy Potter
> Fix For: 5.0, 4.10
>
>
> These messages are currently logged as WARNings, and should either be 
> switched to INFO level (or made more sophisticated so that it can tell when 
> solr is setup for managed resources but the data isn't available)...
> {noformat}
> 2788 [coreLoadExecutor-5-thread-1] WARN  org.apache.solr.rest.ManagedResource 
>  – No stored data found for /rest/managed
> 2788 [coreLoadExecutor-5-thread-1] WARN  org.apache.solr.rest.ManagedResource 
>  – No registered observers for /rest/managed
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock


 [ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter resolved SOLR-6136.
--

   Resolution: Fixed
Fix Version/s: 4.10
   5.0

> ConcurrentUpdateSolrServer includes a Spin Lock
> ---
>
> Key: SOLR-6136
> URL: https://issues.apache.org/jira/browse/SOLR-6136
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
>Reporter: Brandon Chapman
>Assignee: Timothy Potter
>Priority: Critical
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6136.patch, wait___notify_all.patch
>
>
> ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
> causes an extremely high amount of CPU to be used on the Cloud Leader during 
> indexing.
> Here is a summary of our system testing. 
> Importing data on Solr4.5.0: 
> Throughput gets as high as 240 documents per second.
> [tomcat@solr-stg01 logs]$ uptime 
> 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
> 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
> Importing data on Solr4.7.0 with no replicas: 
> Throughput peaks at 350 documents per second.
> [tomcat@solr-stg01 logs]$ uptime 
> 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
> 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
> Importing data on Solr4.7.0 with replicas: 
> Throughput peaks at 30 documents per second because the Solr machine is out 
> of CPU.
> [tomcat@solr-stg01 logs]$ uptime 
> 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
> 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

2014-07-15 Thread Andrew Muldowney (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Muldowney updated SOLR-2894:
---

Attachment: SOLR-2894.patch

I've uploaded a new file with my {{facet.missing}} changes. It's got the small 
and longtail working.

{code:title=DistributedFacetPivotLargeTest.java}rsp = query( "q", "*:*",
 "fq", "-place_s:0placeholder",
 "rows", "0",
 "facet","true",
 "facet.limit","1",
 "facet.missing","true",
 //FacetParams.FACET_OVERREQUEST_RATIO, "0", // force refine
 //FacetParams.FACET_OVERREQUEST_COUNT, "0", // force refine
 "facet.pivot","special_s,company_t");{code}

This test gets whacky when the {{OVERREQUEST}} options are uncommented. With 
the {{OVERREQUEST}} options uncommented we do not get the proper {{bbc}} value 
and so the distributed version diverges from the non-distrib. Your second 
comment on this issue is exactly on point.

Another variance in that test is that on the distrib side we get 
{code}
{field=special_s,value=,count=3,pivot=[
{field=company_t,value=microsoft,count=2}, 
{field=company_t,value=null,count=0}]}
{code}

whereas for the non-distrib we just get
{code}
{field=special_s,value=,count=3,pivot=[
{field=company_t,value=microsoft,count=2}]}
{code}

Should {{facet.missing}} respect the {{mincount}} (in this case it's 1)?

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Hoss Man
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-mincount-minification.patch, 
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch, 
> dateToObject.patch, pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6163) special chars and ManagedSynonymFilterFactory


[ 
https://issues.apache.org/jira/browse/SOLR-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062671#comment-14062671
 ] 

Timothy Potter commented on SOLR-6163:
--

I'll take a look but I think the fix should be upstream from the managed 
resource implementations, seems like Restlet should have already done the 
decoding?

> special chars and ManagedSynonymFilterFactory
> -
>
> Key: SOLR-6163
> URL: https://issues.apache.org/jira/browse/SOLR-6163
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Wim Kumpen
>
> Hey,
> I was playing with the ManagedSynonymFilterFactory to create a synonym list 
> with the API. But I have difficulties when my keys contains special 
> characters (or spaces) to delete them...
> I added a key ééé that matches with some other words. It's saved in the 
> synonym file as ééé.
> When I try to delete it, I do:
> curl -X DELETE 
> "http://localhost/solr/mycore/schema/analysis/synonyms/english/ééé";
> error message: %C3%A9%C3%A9%C3%A9%C2%B5 not found in 
> /schema/analysis/synonyms/english
> A wild guess from me is that %C3%A9 isn't decoded back to ééé. And that's why 
> he can't find the keyword?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3617) Consider adding start scripts.


 [ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-3617:
-

Attachment: SOLR-3617.patch

Previous patch was hosed ... here's a good one.

> Consider adding start scripts.
> --
>
> Key: SOLR-3617
> URL: https://issues.apache.org/jira/browse/SOLR-3617
> Project: Solr
>  Issue Type: New Feature
>Reporter: Mark Miller
> Attachments: SOLR-3617.patch, SOLR-3617.patch
>
>
> I've always found that starting Solr with java -jar start.jar is a little odd 
> if you are not a java guy, but I think there are bigger pros than looking 
> less odd in shipping some start scripts.
> Not only do you get a cleaner start command:
> sh solr.sh or solr.bat or something
> But you also can do a couple other little nice things:
> * it becomes fairly obvious for a new casual user to see how to start the 
> system without reading doc.
> * you can make the working dir the location of the script - this lets you 
> call the start script from another dir and still have all the relative dir 
> setup work.
> * have an out of the box place to save startup params like -Xmx.
> * we could have multiple start scripts - say solr-dev.sh that logged to the 
> console and default to sys default for RAM - and also solr-prod which was 
> fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
> etc.
> You would still of course be able to make the java cmd directly - and that is 
> probably what you would do when it's time to run as a service - but these 
> could be good starter scripts to get people on the right track and improve 
> the initial user experience.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3617) Consider adding start scripts.


 [ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-3617:
-

Attachment: (was: SOLR-3617.patch)

> Consider adding start scripts.
> --
>
> Key: SOLR-3617
> URL: https://issues.apache.org/jira/browse/SOLR-3617
> Project: Solr
>  Issue Type: New Feature
>Reporter: Mark Miller
> Attachments: SOLR-3617.patch, SOLR-3617.patch
>
>
> I've always found that starting Solr with java -jar start.jar is a little odd 
> if you are not a java guy, but I think there are bigger pros than looking 
> less odd in shipping some start scripts.
> Not only do you get a cleaner start command:
> sh solr.sh or solr.bat or something
> But you also can do a couple other little nice things:
> * it becomes fairly obvious for a new casual user to see how to start the 
> system without reading doc.
> * you can make the working dir the location of the script - this lets you 
> call the start script from another dir and still have all the relative dir 
> setup work.
> * have an out of the box place to save startup params like -Xmx.
> * we could have multiple start scripts - say solr-dev.sh that logged to the 
> console and default to sys default for RAM - and also solr-prod which was 
> fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
> etc.
> You would still of course be able to make the java cmd directly - and that is 
> probably what you would do when it's time to run as a service - but these 
> could be good starter scripts to get people on the right track and improve 
> the initial user experience.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3617) Consider adding start scripts.


 [ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-3617:
-

Attachment: SOLR-3617.patch

Here's an updated patch with the following:

1) bin/solr.cmd

For our Windows users ;-) I did my best to emulate the behavior of the Linux 
script (bin/solr). The main difference between the Windows version and the 
Linux version is that I don't know how to implement the stop all in the .cmd 
version, so currently, the user needs to do: solr stop -p PORT. In the Linux, 
if you don't pass the port, it stops all running Solrs, which may be too heavy 
handed anyway so easiest might be to fix the Linux version to require the port 
too.

In general, I'm not a Windows user so there may be some things in this 
implementation that can be done cleaner / easier. Happy to have suggestions on 
how to improve it.

2) Added a restart mode, which stops and starts the Solr server again. Thus, if 
you re-issue start when a server is running, it complains vs. stop / start as 
it did before. 

Lastly, I'm still sorting out how to implement the cloud example. Specifically, 
I'm wondering if we should walk the user through using prompts and reading from 
stdin, that way it's a little more interactive, something like:

$ cd solr-5.0.0/bin
$ ./solr -e cloud

Welcome to the SolrCloud example!

Please enter the number of local nodes you would like to run? [2] <-- default
2

Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.

Enter the port for node1: [8983]
8983

Enter the port for node2: [7574]
7574

Oops! It looks like there is already something running on port 7574, please 
choose another port: [7575]
7575

Ok, starting node1 on port 8983 with embedded ZooKeeper listening on 
localhost:9983

Now starting node2 on port 7575

Success! Found 2 active nodes in your cluster.

Do you want to create a new collection? Y/n
Y

Collection name? [collection1]
collection1

How many shards? [2]
2

How many replicas per shard? [1]
1

Ok, collection1 created successfully, do you want to index some documents? Y/n
Y

Path to file to index: [example/exampledocs/mem.xml]


...


We could also just support a -q flag to indicate "quiet" mode and just accept 
all defaults from the interactive session. And of course -V will activate a 
verbose mode that probably shows more of the commands being run during the 
interactive session.









> Consider adding start scripts.
> --
>
> Key: SOLR-3617
> URL: https://issues.apache.org/jira/browse/SOLR-3617
> Project: Solr
>  Issue Type: New Feature
>Reporter: Mark Miller
> Attachments: SOLR-3617.patch, SOLR-3617.patch
>
>
> I've always found that starting Solr with java -jar start.jar is a little odd 
> if you are not a java guy, but I think there are bigger pros than looking 
> less odd in shipping some start scripts.
> Not only do you get a cleaner start command:
> sh solr.sh or solr.bat or something
> But you also can do a couple other little nice things:
> * it becomes fairly obvious for a new casual user to see how to start the 
> system without reading doc.
> * you can make the working dir the location of the script - this lets you 
> call the start script from another dir and still have all the relative dir 
> setup work.
> * have an out of the box place to save startup params like -Xmx.
> * we could have multiple start scripts - say solr-dev.sh that logged to the 
> console and default to sys default for RAM - and also solr-prod which was 
> fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
> etc.
> You would still of course be able to make the java cmd directly - and that is 
> probably what you would do when it's time to run as a service - but these 
> could be good starter scripts to get people on the right track and improve 
> the initial user experience.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively

2014-07-15 Thread Jessica Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062608#comment-14062608
 ] 

Jessica Cheng commented on SOLR-5473:
-

I think there can be a race condition in CloudSolrServer's state-caching if the 
state is fetched just when a collection is created but none of the replicas 
have been added. If this state is cached, then until it expires, all the 
requests will fail-fast with an empty theUrlList. We may need to optionally 
skip caching if any of the shards has no replicas at all.

> Split clusterstate.json per collection and watch states selectively 
> 
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, 
> ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5823) recognize hunspell FULLSTRIP option in affix file

2014-07-15 Thread Ryan Ernst (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062527#comment-14062527
 ] 

Ryan Ernst commented on LUCENE-5823:


LGTM.

> recognize hunspell FULLSTRIP option in affix file
> -
>
> Key: LUCENE-5823
> URL: https://issues.apache.org/jira/browse/LUCENE-5823
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5823.patch
>
>
> With LUCENE-5818 we fixed stripping to be correct (ensuring it doesnt strip 
> the entire word before applying an affix). This is usually true, but there is 
> an option in the affix file to allow this.
> Its used by several languages (french, latvian, swedish, etc)
> {noformat}
> FULLSTRIP
>   With FULLSTRIP, affix rules can strip full words, not  only  one
>   less characters, before adding the affixes
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues


 [ 
https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-6137.
--

   Resolution: Fixed
Fix Version/s: 4.10
   5.0
 Assignee: Steve Rowe

Resolving - remaining issues will be dealt with on the issues Gregory raised.

Thanks Gregory!

> Managed Schema / Schemaless and SolrCloud concurrency issues
> 
>
> Key: SOLR-6137
> URL: https://issues.apache.org/jira/browse/SOLR-6137
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis, SolrCloud
>Reporter: Gregory Chanan
>Assignee: Steve Rowe
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6137.patch, SOLR-6137.patch, SOLR-6137v2.patch, 
> SOLR-6137v3.patch, SOLR-6137v4.patch
>
>
> This is a follow up to a message on the mailing list, linked here: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
> The Managed Schema integration with SolrCloud seems pretty limited.
> The issue I'm running into is variants of the issue that schema changes are 
> not pushed to all shards/replicas synchronously.  So, for example, I can make 
> the following two requests:
> 1) add a field to the collection on server1 using the Schema API
> 2) add a document with the new field, the document is routed to a core on 
> server2
> Then, there appears to be a race between when the document is processed by 
> the core on server2 and when the core on server2, via the 
> ZkIndexSchemaReader, gets the new schema.  If the document is processed 
> first, I get a 400 error because the field doesn't exist.  This is easily 
> reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
> I hit a similar issue with Schemaless: the distributed request handler sends 
> out the document updates, but there is no guarantee that the other 
> shards/replicas see the schema changes made by the update.chain.
> Another issue I noticed today: making multiple schema API calls concurrently 
> can block; that is, one may get through and the other may infinite loop.
> So, for reference, the issues include:
> 1) Schema API changes return success before all cores are updated; subsequent 
> calls attempting to use new schema may fail
> 2) Schemaless changes may fail on replicas/other shards for the same reason
> 3) Concurrent Schema API changes may block
> From Steve Rowe on the mailing list:
> {quote}
> For Schema API users, delaying a couple of seconds after adding fields before 
> using them should workaround this problem.  While not ideal, I think schema 
> field additions are rare enough in the Solr collection lifecycle that this is 
> not a huge problem.
> For schemaless users, the picture is worse, as you noted.  Immediate 
> distribution of documents triggering schema field addition could easily prove 
> problematic.  Maybe we need a schema update blocking mode, where after the ZK 
> schema node watch is triggered, all new request processing is halted until 
> the schema is finished downloading/parsing/swapping out? (Such a mode should 
> help Schema API users too.)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues

2014-07-15 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062508#comment-14062508
 ] 

Gregory Chanan commented on SOLR-6137:
--

Thanks [~sar...@syr.edu]!  Your changes make sense.

bq. Schema API changes return success before all cores are updated; subsequent 
calls attempting to use new schema may fail

I filed SOLR-6249 for this.

bq. One small issue I noticed is that there is a race between parsing and 
schema addition.

I filed SOLR-6250 for this

bq. Anything else?

Nope.

> Managed Schema / Schemaless and SolrCloud concurrency issues
> 
>
> Key: SOLR-6137
> URL: https://issues.apache.org/jira/browse/SOLR-6137
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis, SolrCloud
>Reporter: Gregory Chanan
> Attachments: SOLR-6137.patch, SOLR-6137.patch, SOLR-6137v2.patch, 
> SOLR-6137v3.patch, SOLR-6137v4.patch
>
>
> This is a follow up to a message on the mailing list, linked here: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
> The Managed Schema integration with SolrCloud seems pretty limited.
> The issue I'm running into is variants of the issue that schema changes are 
> not pushed to all shards/replicas synchronously.  So, for example, I can make 
> the following two requests:
> 1) add a field to the collection on server1 using the Schema API
> 2) add a document with the new field, the document is routed to a core on 
> server2
> Then, there appears to be a race between when the document is processed by 
> the core on server2 and when the core on server2, via the 
> ZkIndexSchemaReader, gets the new schema.  If the document is processed 
> first, I get a 400 error because the field doesn't exist.  This is easily 
> reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
> I hit a similar issue with Schemaless: the distributed request handler sends 
> out the document updates, but there is no guarantee that the other 
> shards/replicas see the schema changes made by the update.chain.
> Another issue I noticed today: making multiple schema API calls concurrently 
> can block; that is, one may get through and the other may infinite loop.
> So, for reference, the issues include:
> 1) Schema API changes return success before all cores are updated; subsequent 
> calls attempting to use new schema may fail
> 2) Schemaless changes may fail on replicas/other shards for the same reason
> 3) Concurrent Schema API changes may block
> From Steve Rowe on the mailing list:
> {quote}
> For Schema API users, delaying a couple of seconds after adding fields before 
> using them should workaround this problem.  While not ideal, I think schema 
> field additions are rare enough in the Solr collection lifecycle that this is 
> not a huge problem.
> For schemaless users, the picture is worse, as you noted.  Immediate 
> distribution of documents triggering schema field addition could easily prove 
> problematic.  Maybe we need a schema update blocking mode, where after the ZK 
> schema node watch is triggered, all new request processing is halted until 
> the schema is finished downloading/parsing/swapping out? (Such a mode should 
> help Schema API users too.)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6250) Schemaless parsing does not work on a consistent schema

2014-07-15 Thread Gregory Chanan (JIRA)

Gregory Chanan created SOLR-6250:

Summary: Schemaless parsing does not work on a consistent schema
Key: SOLR-6250
URL: https://issues.apache.org/jira/browse/SOLR-6250
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Reporter: Gregory Chanan

See this comment
(https://issues.apache.org/jira/browse/SOLR-6137?focusedCommentId=14044366&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044366),
reproduced here:

bq. One small issue I noticed is that there is a race between parsing and
schema addition. The AddSchemaFieldsUpdateProcessor handles this by only
working on a fixed schema, so the schema doesn't change underneath it. If it
decides on a schema addition and that fails (because another addition beat it),
it will grab the latest schema and retry. But the parsers don't do that so the
core's schema can change in the middle of parsing. It may make sense to defend
against that by moving the retry code from the AddSchemaFieldsUpdateProcessor
to some processor that runs before all the parsers. The downside is if the
schema addition fails, you have to rerun all the parsers, but that may be a
minor concern.
bq. This may not actually matter. Consider the case tested at the end of the
test: two documents are simultaneously inserted with the same field having a
Long and Date value. Assume the Date wins the schema "race" and is updated
first. While parsing the Long, each parser may see the schema as having a date
field or no field. If a valid parser (that is, one that can modify the field
value) sees a date field, it won't do any modifications because shouldMutate
will fail, leaving the object in whatever state the serializer left it (either
Long or String). If it sees no field, it will mutate the object to create a
Long object. In either case, we should get an error at the point we actually
create the lucene document, because neither a Long nor
String-representation-of-a-long can be stored in a Date field. This is pretty
difficult to reason about though.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5681) Fix RAMDirectory's IndexInput to not double-buffer on slice()


 [ 
https://issues.apache.org/jira/browse/LUCENE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5681:
--

Attachment: LUCENE-5681.patch

Added ineffectivity warning to {{BufferedIndexInput#wrap()}} and cleaned up 
sliceDescription to be consistent.

> Fix RAMDirectory's IndexInput to not double-buffer on slice()
> -
>
> Key: LUCENE-5681
> URL: https://issues.apache.org/jira/browse/LUCENE-5681
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Reporter: Uwe Schindler
> Fix For: 5.0
>
> Attachments: LUCENE-5681.patch, LUCENE-5681.patch
>
>
> After LUCENE-4371, we still have a non-optimal implementation of 
> IndexInput#slice() in RAMDirectory. We should fix that to use the cloning 
> approach like other directories do



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6249) Schema API changes return success before all cores are updated

2014-07-15 Thread Gregory Chanan (JIRA)

Gregory Chanan created SOLR-6249:


 Summary: Schema API changes return success before all cores are 
updated
 Key: SOLR-6249
 URL: https://issues.apache.org/jira/browse/SOLR-6249
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, SolrCloud
Reporter: Gregory Chanan


See SOLR-6137 for more details.

The basic issue is that Schema API changes return success when the first core 
is updated, but other cores asynchronously read the updated schema from 
ZooKeeper.

So a client application could make a Schema API change and then index some 
documents based on the new schema that may fail on other nodes.

Possible fixes:
1) Make the Schema API calls synchronous
2) Give the client some ability to track the state of the schema.  They can 
already do this to a certain extent by checking the Schema API on all the 
replicas and verifying that the field has been added, though this is pretty 
cumbersome.  Maybe it makes more sense to do this sort of thing on the 
collection level, i.e. Schema API changes return the zk version to the client.  
We add an API to return the current zk version.  On a replica, if the zk 
version is >= the version the client has, the client knows that replica has at 
least seen the schema change.  We could also provide an API to do the 
distribution and checking across the different replicas of the collection so 
that clients don't need ot do that themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5681) Fix RAMDirectory's IndexInput to not double-buffer on slice()


[ 
https://issues.apache.org/jira/browse/LUCENE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062475#comment-14062475
 ] 

Uwe Schindler commented on LUCENE-5681:
---

This improvement is especially important for slices of NRTCachingDirectory, 
because it uses RAMDirectory internally, too!

> Fix RAMDirectory's IndexInput to not double-buffer on slice()
> -
>
> Key: LUCENE-5681
> URL: https://issues.apache.org/jira/browse/LUCENE-5681
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Reporter: Uwe Schindler
> Fix For: 5.0
>
> Attachments: LUCENE-5681.patch
>
>
> After LUCENE-4371, we still have a non-optimal implementation of 
> IndexInput#slice() in RAMDirectory. We should fix that to use the cloning 
> approach like other directories do



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5681) Fix RAMDirectory's IndexInput to not double-buffer on slice()


 [ 
https://issues.apache.org/jira/browse/LUCENE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5681:
--

Attachment: LUCENE-5681.patch

Patch, including new test.

The default impl is now only used by Solr anymore. We should fix this, too and 
remove the BufferedIndexInput.wrap() one completely.

> Fix RAMDirectory's IndexInput to not double-buffer on slice()
> -
>
> Key: LUCENE-5681
> URL: https://issues.apache.org/jira/browse/LUCENE-5681
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Reporter: Uwe Schindler
> Fix For: 5.0
>
> Attachments: LUCENE-5681.patch
>
>
> After LUCENE-4371, we still have a non-optimal implementation of 
> IndexInput#slice() in RAMDirectory. We should fix that to use the cloning 
> approach like other directories do



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

2014-07-15 Thread Andrew Muldowney (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062438#comment-14062438
 ] 

Andrew Muldowney edited comment on SOLR-2894 at 7/15/14 6:28 PM:
-

I've been making generally good headway on the .missing problem. We've got a 
new {{PivotFacetFieldValueCollection}} that should deal with the {{null}} 
values properly. Right now the Small and LongTail tests pass but the Long fails 
on the new {{facet.limit=1}} and {{facet.missing=true}} case with {{SPECIAL}}. 
The control response doesn't include the {{null}} and the distributed response 
doesn't get the count of {{bbc}} right, it only gets 150 and I'm sure the 298 
it gets for {{microsoft}} is wrong too. There is something on the shard side 
code that is not happy with our "" and {{null}} values. I'm working on that 
right now.

My assumption is that the {{facet.missing}} request makes it out to all the 
shards so we never need to refine on it since all shards responded with the 
full information, but I guess that isn't always the case since other fields 
under that null value might have limits that would need to be refined on?


was (Author: andrew.muldowney):
I've been making generally good headway on the .missing problem. We've got a 
new {{PivotFacetFieldValueCollection}} that should deal with the {{null}} 
values properly. Right now the Small and LongTail tests pass but the Long fails 
on the new {{facet.limit=1}} and {{facet.missing=true}} case with {{SPECIAL}}. 
The control response doesn't include the {{null}} and the distributed response 
doesn't get the count of {{bbc}} right, it only gets 150 and I'm sure the 298 
it gets for {{microsoft}} is wrong too. There is something on the shard side 
code that is not happy with our "" and {{null}} values. I'm working on that 
right now.

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Hoss Man
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-mincount-minification.patch, 
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894_cloud_test.patch, dateToObject.patch, 
> pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-07-15 Thread Andrew Muldowney (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062438#comment-14062438
 ] 

Andrew Muldowney commented on SOLR-2894:


I've been making generally good headway on the .missing problem. We've got a 
new {{PivotFacetFieldValueCollection}} that should deal with the {{null}} 
values properly. Right now the Small and LongTail tests pass but the Long fails 
on the new {{facet.limit=1}} and {{facet.missing=true}} case with {{SPECIAL}}. 
The control response doesn't include the {{null}} and the distributed response 
doesn't get the count of {{bbc}} right, it only gets 150 and I'm sure the 298 
it gets for {{microsoft}} is wrong too. There is something on the shard side 
code that is not happy with our "" and {{null}} values. I'm working on that 
right now.

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Hoss Man
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-mincount-minification.patch, 
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894_cloud_test.patch, dateToObject.patch, 
> pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6248) MoreLikeThis Query Parser


 [ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-6248:
-

Description: 
MLT Component doesn't let people highlight/paginate and the handler comes with 
an cost of maintaining another piece in the config. Also, any changes to the 
default (number of results to be fetched etc.) /select handler need to be 
copied/synced with this handler too.

Having an MLT QParser would let users get back docs based on a query for them 
to paginate, highlight etc. It would also give them the flexibility to use this 
anywhere i.e. q,fq,bq etc.

A bit of history about MLT (thanks to Hoss)

MLT Handler pre-dates the existence of QParsers and was meant to take an 
arbitrary query as input, find docs that match that 
query, club them together to find interesting terms, and then use those 
terms as if they were my main query to generate a main result set.

This result would then be used as the set to facet, highlight etc.

The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList\(y)

The MLT component on the other hand solved a very different purpose of 
augmenting the main result set. It is used to get similar docs for each of the 
doc in the main result set.

DocSet\(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)

The new approach:

All of this can be done better and cleaner (and makes more sense too) using an 
MLT QParser.

An important thing to handle here is the case where the user doesn't have 
TermVectors, in which case, it does what happens right now i.e. parsing stored 
fields.

Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
field would need to be a TextField with an index analyzer defined. This 
analyzer will then be used to extract terms for MLT.

In case of SolrCloud mode, '/get-termvectors' can be used after looking at the 
schema (if TermVectors are enabled for the field). If not, a /get call can be 
used to fetch the field and parse it.

  was:
MLT Component doesn't let people highlight/paginate and the handler comes with 
an cost of maintaining another piece in the config. Also, any changes to the 
default (number of results to be fetched etc.) /select handler need to be 
copied/synced with this handler too.

Having an MLT QParser would let users get back docs based on a query for them 
to paginate, highlight etc. It would also give them the flexibility to use this 
anywhere i.e. q,fq,bq etc.

A bit of history about MLT (thanks to Hoss)

MLT Handler pre-dates the existence of QParsers and was meant to take an 
arbitrary query as input, find docs that match that 
query, club them together to find interesting terms, and then use those 
terms as if they were my main query to generate a main result set.

This result would then be used as the set to facet, highlight etc.

The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList(y)

The MLT component on the other hand solved a very different purpose of 
augmenting the main result set. It is used to get similar docs for each of the 
doc in the main result set.

DocSet(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)

The new approach:

All of this can be done better and cleaner (and makes more sense too) using an 
MLT QParser.

An important thing to handle here is the case where the user doesn't have 
TermVectors, in which case, it does what happens right now i.e. parsing stored 
fields.

Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
field would need to be a TextField with an index analyzer defined. This 
analyzer will then be used to extract terms for MLT.

In case of SolrCloud mode, '/get-termvectors' can be used after looking at the 
schema (if TermVectors are enabled for the field). If not, a /get call can be 
used to fetch the field and parse it.


> MoreLikeThis Query Parser
> -
>
> Key: SOLR-6248
> URL: https://issues.apache.org/jira/browse/SOLR-6248
> Project: Solr
>  Issue Type: New Feature
>Reporter: Anshum Gupta
>
> MLT Component doesn't let people highlight/paginate and the handler comes 
> with an cost of maintaining another piece in the config. Also, any changes to 
> the default (number of results to be fetched etc.) /select handler need to be 
> copied/synced with this handler too.
> Having an MLT QParser would let users get back docs based on a query for them 
> to paginate, highlight etc. It would also give them the flexibility to use 
> this anywhere i.e. q,fq,bq etc.
> A bit of history about MLT (thanks to Hoss)
> MLT Handler pre-dates the existence of QParsers and was meant to take an 
> arbitrary query as input, find docs that match that 
> query, club them together to find interesting terms, and then use those 
> terms as if they were my main query to generate a main result set.
> This result would then be used as the set to facet, highli

[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-07-15 Thread Anshum Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062358#comment-14062358
 ] 

Anshum Gupta commented on SOLR-5480:


[~vzhovtiuk] This is very different from what this JIRA talks about and not in 
line with the existing patches/intent.
I have created a new JIRA (SOLR-6248)  that is fit for this approach. It should 
be able to (functionally) solve the issue that this JIRA talks about.


> Make MoreLikeThisHandler distributable
> --
>
> Key: SOLR-5480
> URL: https://issues.apache.org/jira/browse/SOLR-5480
> Project: Solr
>  Issue Type: Improvement
>Reporter: Steve Molloy
>Assignee: Noble Paul
> Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
> SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
> SOLR-5480.patch
>
>
> The MoreLikeThis component, when used in the standard search handler supports 
> distributed searches. But the MoreLikeThisHandler itself doesn't, which 
> prevents from say, passing in text to perform the query. I'll start looking 
> into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
> has some work done already and want to share, or want to contribute, any help 
> will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues


[ 
https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062352#comment-14062352
 ] 

Steve Rowe commented on SOLR-6137:
--

[~gchanan], I've committed your patch to trunk and branch_4x with minor 
modifications (see previous comment).

I think what's left are:

bq. Schema API changes return success before all cores are updated; subsequent 
calls attempting to use new schema may fail

and 

bq. One small issue I noticed is that there is a race between parsing and 
schema addition. 

A new issue for this one seems like a good idea.

Anything else?

> Managed Schema / Schemaless and SolrCloud concurrency issues
> 
>
> Key: SOLR-6137
> URL: https://issues.apache.org/jira/browse/SOLR-6137
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis, SolrCloud
>Reporter: Gregory Chanan
> Attachments: SOLR-6137.patch, SOLR-6137.patch, SOLR-6137v2.patch, 
> SOLR-6137v3.patch, SOLR-6137v4.patch
>
>
> This is a follow up to a message on the mailing list, linked here: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
> The Managed Schema integration with SolrCloud seems pretty limited.
> The issue I'm running into is variants of the issue that schema changes are 
> not pushed to all shards/replicas synchronously.  So, for example, I can make 
> the following two requests:
> 1) add a field to the collection on server1 using the Schema API
> 2) add a document with the new field, the document is routed to a core on 
> server2
> Then, there appears to be a race between when the document is processed by 
> the core on server2 and when the core on server2, via the 
> ZkIndexSchemaReader, gets the new schema.  If the document is processed 
> first, I get a 400 error because the field doesn't exist.  This is easily 
> reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
> I hit a similar issue with Schemaless: the distributed request handler sends 
> out the document updates, but there is no guarantee that the other 
> shards/replicas see the schema changes made by the update.chain.
> Another issue I noticed today: making multiple schema API calls concurrently 
> can block; that is, one may get through and the other may infinite loop.
> So, for reference, the issues include:
> 1) Schema API changes return success before all cores are updated; subsequent 
> calls attempting to use new schema may fail
> 2) Schemaless changes may fail on replicas/other shards for the same reason
> 3) Concurrent Schema API changes may block
> From Steve Rowe on the mailing list:
> {quote}
> For Schema API users, delaying a couple of seconds after adding fields before 
> using them should workaround this problem.  While not ideal, I think schema 
> field additions are rare enough in the Solr collection lifecycle that this is 
> not a huge problem.
> For schemaless users, the picture is worse, as you noted.  Immediate 
> distribution of documents triggering schema field addition could easily prove 
> problematic.  Maybe we need a schema update blocking mode, where after the ZK 
> schema node watch is triggered, all new request processing is halted until 
> the schema is finished downloading/parsing/swapping out? (Such a mode should 
> help Schema API users too.)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6248) MoreLikeThis Query Parser

2014-07-15 Thread Anshum Gupta (JIRA)

Anshum Gupta created SOLR-6248:
--

 Summary: MoreLikeThis Query Parser
 Key: SOLR-6248
 URL: https://issues.apache.org/jira/browse/SOLR-6248
 Project: Solr
  Issue Type: New Feature
Reporter: Anshum Gupta


MLT Component doesn't let people highlight/paginate and the handler comes with 
an cost of maintaining another piece in the config. Also, any changes to the 
default (number of results to be fetched etc.) /select handler need to be 
copied/synced with this handler too.

Having an MLT QParser would let users get back docs based on a query for them 
to paginate, highlight etc. It would also give them the flexibility to use this 
anywhere i.e. q,fq,bq etc.

A bit of history about MLT (thanks to Hoss)

MLT Handler pre-dates the existence of QParsers and was meant to take an 
arbitrary query as input, find docs that match that 
query, club them together to find interesting terms, and then use those 
terms as if they were my main query to generate a main result set.

This result would then be used as the set to facet, highlight etc.

The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList(y)

The MLT component on the other hand solved a very different purpose of 
augmenting the main result set. It is used to get similar docs for each of the 
doc in the main result set.

DocSet(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)

The new approach:

All of this can be done better and cleaner (and makes more sense too) using an 
MLT QParser.

An important thing to handle here is the case where the user doesn't have 
TermVectors, in which case, it does what happens right now i.e. parsing stored 
fields.

Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
field would need to be a TextField with an index analyzer defined. This 
analyzer will then be used to extract terms for MLT.

In case of SolrCloud mode, '/get-termvectors' can be used after looking at the 
schema (if TermVectors are enabled for the field). If not, a /get call can be 
used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5819) Add block tree postings format that supports term ords


[ 
https://issues.apache.org/jira/browse/LUCENE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062343#comment-14062343
 ] 

Michael McCandless commented on LUCENE-5819:


The gist of the change here is that the terms index FST, via a new
custom Outputs impl FSTOrdsOutputs, now also stores the start and end
ord range for each block.  The end ord is also necessary because the
terms don't neatly fall into just the leaf blocks: "straggler" terms
can easily fall inside inner blocks, and in this case we need the end
ord of the lower blocks to realize the term is a "straggler".

The on-disk blocks themselves are nearly the same; the only difference
is when a block writes a pointer to a sub-block, it now also writes
(vlong) how many terms are in that sub-block.  This way when we are
seeking by ord and skip that sub-block we know how many ords were just
skipped.

I made a custom getByOutput to handle the ranges, falling back to the
last range that included the target ord while recursing.

Otherwise the terms dict is basically the same as the normal block
tree, including optimized intersect (w/o ord() implemented: not sure
we need it), except all seek/next operations also compute the term
ord.  Floor blocks also store the term ord each one starts on.


> Add block tree postings format that supports term ords
> --
>
> Key: LUCENE-5819
> URL: https://issues.apache.org/jira/browse/LUCENE-5819
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5819.patch, LUCENE-5819.patch
>
>
> BlockTree is our default terms dictionary today, but it doesn't
> support term ords, which is an optional API in the postings format to
> retrieve the ordinal for the currently seek'd term, and also later
> seek by that ordinal e.g. to lookup the term.
> This can possibly be useful for e.g. faceting, and maybe at some point
> we can share the postings terms dict with the one used by sorted/set
> DV for cases when app wants to invert and facet on a given field.
> The older (3.x) block terms dict can easily support ords, and we have
> a Lucene41OrdsPF in test-framework, but it's not as fast / compact as
> block-tree, and doesn't (can't easily) implement an optimized
> intersect, but it could be for fields we'd want to facet on, these
> tradeoffs don't matter.  It's nice to have options...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Solr checkIfIAmLeader usage from ZK event thread

2014-07-15 Thread Ramkumar R. Aiyengar

Currently when a replica is watching the current leader's ephemeral node
and the leader disappears, it runs the leadership check along with its two
way peer sync, ZK update etc. on the ZK event thread where the watch was
fired.

What this means is that for instances with lots of cores, you would be
serializing leadership elections and the last in the list could take a long
time to have a replacement elected (during which you will have no leader).

I did a quick change to make the checkIfIAmLeader call async, but Solr
cloud tests being what they are (thanks Shalin for cleaning them up btw :)
), I wanted to check if I am doing something stupid. If not, I will raise a
JIRA.

One contention could be if you might end up with two elections for the same
shard, but I can't see how that might happen..

[jira] [Resolved] (SOLR-6247) Can't delete utf-8 word in ManagedStopFilterFactory


 [ 
https://issues.apache.org/jira/browse/SOLR-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-6247.


Resolution: Fixed

duplicate bug report: SOLR-6163

> Can't delete utf-8 word in ManagedStopFilterFactory
> ---
>
> Key: SOLR-6247
> URL: https://issues.apache.org/jira/browse/SOLR-6247
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.9
> Environment: MacOS, Solr started locally
>Reporter: Patryk Maryniok
>
> Request:
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
> or
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";
> Response:
> bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 
> not found in /schema/analysis/stopwords/polish", "code":404}}
> I can't delete this word, encoding doesn't affect. Am I doing something wrong 
> or is it bug? It also happens in ManagedSynonymFilterFactory.
> Response for GET:
> {code:xml}
> {
>   "responseHeader":{
> "status":0,
> "QTime":195},
>   "wordSet":{
> "initArgs":{"ignoreCase":true},
> "initializedOn":"2014-07-15T14:52:53.859Z",
> "managedList":["a",
>   "i",
>   "się",
>   "w",
>   "z"]}
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

2014-07-15 Thread Steve Davids (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062250#comment-14062250
 ] 

Steve Davids commented on SOLR-5986:


bq.  I wonder why you would have to restart the replica? I presume this is 
because that is your only recourse to stop a query that might take days to 
complete?
Yes, that is correct, that is the easiest way to kill a run-away thread.


bq. If a query takes that long and is ignoring a specified timeout, that seems 
like it's own issue that needs resolution.
The Solr instance that is distributing the requests to other shards honors the 
timeout value and stops the collection process once the threshold is met (and 
returns to the client with partial results if any are available), though the 
queries remain running on all of the shards that were initially searched in the 
overall distributed request. If the timeout value is honored on each shard that 
was used in the distributed request that would probably take care of the 
problem.


bq. IMHO, the primary goal should be to make SolrCloud clusters more resilient 
to performance degradations caused by such nasty queries described above.
+1 resiliency to performance degradations is always a good thing :)


bq. The circuit-breaker approach in the linked ES tickets is clever, but it 
does not seem to be as generally applicable as the ability to view all running 
queries with an option to stop them.
+1 I actually prefer the BLUR route, though being able to see the current 
queries plus the ability to kill them off across the cluster would be great. 
Although it is crucial to be able to automatically have queries be killed off 
after a certain threshold (ideally the timeout value). This is necessary 
because I don't want to be monitoring the Solr admin page at all hours during 
the day (though I could create scripts to do the work if an API call is 
available, but not preferred).


bq. My preference would be to have a response mechanism that 1) applies broadly 
and 2) a dev-ops guy can execute in a UI like Solr Admin, or even by API.
+1 if "applied broadly" means ability to specify a threshold to start killing 
off queries.

> Don't allow runaway queries from harming Solr cluster health or search 
> performance
> --
>
> Key: SOLR-5986
> URL: https://issues.apache.org/jira/browse/SOLR-5986
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Steve Davids
>Priority: Critical
> Fix For: 4.9
>
>
> The intent of this ticket is to have all distributed search requests stop 
> wasting CPU cycles on requests that have already timed out or are so 
> complicated that they won't be able to execute. We have come across a case 
> where a nasty wildcard query within a proximity clause was causing the 
> cluster to enumerate terms for hours even though the query timeout was set to 
> minutes. This caused a noticeable slowdown within the system which made us 
> restart the replicas that happened to service that one request, the worst 
> case scenario are users with a relatively low zk timeout value will have 
> nodes start dropping from the cluster due to long GC pauses.
> [~amccurry] Built a mechanism into Apache Blur to help with the issue in 
> BLUR-142 (see commit comment for code, though look at the latest code on the 
> trunk for newer bug fixes).
> Solr should be able to either prevent these problematic queries from running 
> by some heuristic (possibly estimated size of heap usage) or be able to 
> execute a thread interrupt on all query threads once the time threshold is 
> met. This issue mirrors what others have discussed on the mailing list: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 2031 - Still Failing

2014-07-15 Thread Chris Hostetter


: 2) chain that broke involves WDF which has historically been known to 
: be problematic -- but in theory LUCENE-3843 fixed WDF already?

cut/paste error -- i ment LUCENE-5111


-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 2031 - Still Failing

2014-07-15 Thread Chris Hostetter


NOTE:

1) seed reproduces for me on branch_4x, linux 64bit, java7

2) chain that broke involves WDF which has historically been known to 
be problematic -- but in theory LUCENE-3843 fixed WDF already?

ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
-Dtests.seed=2871E466A5044906 -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=no_NO -Dtests.timezone=Atlantic/South_Georgia 
-Dtests.file.encoding=US-ASCII

Perhaps the problem is specific to the combination of of filters?

   [junit4]   2> TEST FAIL: useCharFilter=false text='v|p({  Exception from random analyzer: 
   [junit4]   2> charfilters=
   [junit4]   2>   
org.apache.lucene.analysis.fa.PersianCharFilter(java.io.StringReader@7cc398b8)
   [junit4]   2>   
org.apache.lucene.analysis.fa.PersianCharFilter(org.apache.lucene.analysis.fa.PersianCharFilter@7ef5b8c5)
   [junit4]   2> tokenizer=
   [junit4]   2>   
org.apache.lucene.analysis.path.PathHierarchyTokenizer(org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@690c7d5)
   [junit4]   2> filters=
   [junit4]   2>   
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter(LUCENE_4_10, 
ValidatingTokenFilter@587d7793 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,
 
7, [vxaqklts, arraatqvg])
   [junit4]   2>   
org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@6bbaa8d8 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word)
   [junit4]   2> offsetsAreCorrect=false




: Date: Tue, 15 Jul 2014 12:42:31 + (UTC)
: From: Apache Jenkins Server 
: Reply-To: dev@lucene.apache.org
: To: dev@lucene.apache.org
: Subject: [JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 2031 - Still Failing
: 
: Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/2031/
: 
: 1 tests failed.
: REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains
: 
: Error Message:
: startOffset must be non-negative, and endOffset must be >= startOffset, 
startOffset=2,endOffset=1
: 
: Stack Trace:
: java.lang.IllegalArgumentException: startOffset must be non-negative, and 
endOffset must be >= startOffset, startOffset=2,endOffset=1
:   at 
__randomizedtesting.SeedInfo.seed([2871E466A5044906:1590CD07E21654C6]:0)
:   at 
org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl.setOffset(PackedTokenAttributeImpl.java:107)
:   at 
org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345)
:   at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
:   at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:703)
:   at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:614)
:   at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:513)
:   at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:927)
:   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
:   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
:   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
:   at java.lang.reflect.Method.invoke(Method.java:606)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
:   at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
:   at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
:   at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
:   at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
:   at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
:   at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3

[jira] [Updated] (SOLR-6247) Can't delete utf-8 word in ManagedStopFilterFactory


 [ 
https://issues.apache.org/jira/browse/SOLR-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patryk Maryniok updated SOLR-6247:
--

Description: 
Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug? It also happens in ManagedSynonymFilterFactory.

Response for GET:
{code:xml}
{
  "responseHeader":{
"status":0,
"QTime":195},
  "wordSet":{
"initArgs":{"ignoreCase":true},
"initializedOn":"2014-07-15T14:52:53.859Z",
"managedList":["a",
  "i",
  "się",
  "w",
  "z"]}
}
{code}

  was:
Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug? It also happens in ManagedSynonymFilterFactory.

Response for GET:
{
  "responseHeader":{
"status":0,
"QTime":195},
  "wordSet":{
"initArgs":{"ignoreCase":true},
"initializedOn":"2014-07-15T14:52:53.859Z",
"managedList":["a",
  "i",
  "się",
  "w",
  "z"]}}


> Can't delete utf-8 word in ManagedStopFilterFactory
> ---
>
> Key: SOLR-6247
> URL: https://issues.apache.org/jira/browse/SOLR-6247
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.9
> Environment: MacOS, Solr started locally
>Reporter: Patryk Maryniok
>
> Request:
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
> or
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";
> Response:
> bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 
> not found in /schema/analysis/stopwords/polish", "code":404}}
> I can't delete this word, encoding doesn't affect. Am I doing something wrong 
> or is it bug? It also happens in ManagedSynonymFilterFactory.
> Response for GET:
> {code:xml}
> {
>   "responseHeader":{
> "status":0,
> "QTime":195},
>   "wordSet":{
> "initArgs":{"ignoreCase":true},
> "initializedOn":"2014-07-15T14:52:53.859Z",
> "managedList":["a",
>   "i",
>   "się",
>   "w",
>   "z"]}
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6247) Can't delete utf-8 word in ManagedStopFilterFactory


 [ 
https://issues.apache.org/jira/browse/SOLR-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patryk Maryniok updated SOLR-6247:
--

Description: 
Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug? It also happens in ManagedSynonymFilterFactory.

Response for GET:
{
  "responseHeader":{
"status":0,
"QTime":195},
  "wordSet":{
"initArgs":{"ignoreCase":true},
"initializedOn":"2014-07-15T14:52:53.859Z",
"managedList":["a",
  "i",
  "się",
  "w",
  "z"]}}

  was:
Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug? It also happens in ManagedSynonymFilterFactory.

{quote}
{
  "responseHeader":{
"status":0,
"QTime":195},
  "wordSet":{
"initArgs":{"ignoreCase":true},
"initializedOn":"2014-07-15T14:52:53.859Z",
"managedList":["a",
  "i",
  "się",
  "w",
  "z"]}}
{/quote}


> Can't delete utf-8 word in ManagedStopFilterFactory
> ---
>
> Key: SOLR-6247
> URL: https://issues.apache.org/jira/browse/SOLR-6247
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.9
> Environment: MacOS, Solr started locally
>Reporter: Patryk Maryniok
>
> Request:
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
> or
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";
> Response:
> bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 
> not found in /schema/analysis/stopwords/polish", "code":404}}
> I can't delete this word, encoding doesn't affect. Am I doing something wrong 
> or is it bug? It also happens in ManagedSynonymFilterFactory.
> Response for GET:
> {
>   "responseHeader":{
> "status":0,
> "QTime":195},
>   "wordSet":{
> "initArgs":{"ignoreCase":true},
> "initializedOn":"2014-07-15T14:52:53.859Z",
> "managedList":["a",
>   "i",
>   "się",
>   "w",
>   "z"]}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6247) Can't delete utf-8 word in ManagedStopFilterFactory


 [ 
https://issues.apache.org/jira/browse/SOLR-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patryk Maryniok updated SOLR-6247:
--

Description: 
Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug? It also happens in ManagedSynonymFilterFactory.

{quote}
{
  "responseHeader":{
"status":0,
"QTime":195},
  "wordSet":{
"initArgs":{"ignoreCase":true},
"initializedOn":"2014-07-15T14:52:53.859Z",
"managedList":["a",
  "i",
  "się",
  "w",
  "z"]}}
{/quote}

  was:
Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug? It also happens in ManagedSynonymFilterFactory.


> Can't delete utf-8 word in ManagedStopFilterFactory
> ---
>
> Key: SOLR-6247
> URL: https://issues.apache.org/jira/browse/SOLR-6247
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.9
> Environment: MacOS, Solr started locally
>Reporter: Patryk Maryniok
>
> Request:
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
> or
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";
> Response:
> bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 
> not found in /schema/analysis/stopwords/polish", "code":404}}
> I can't delete this word, encoding doesn't affect. Am I doing something wrong 
> or is it bug? It also happens in ManagedSynonymFilterFactory.
> {quote}
> {
>   "responseHeader":{
> "status":0,
> "QTime":195},
>   "wordSet":{
> "initArgs":{"ignoreCase":true},
> "initializedOn":"2014-07-15T14:52:53.859Z",
> "managedList":["a",
>   "i",
>   "się",
>   "w",
>   "z"]}}
> {/quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6247) Can't delete utf-8 word in ManagedStopFilterFactory

Patryk Maryniok created SOLR-6247:
-

 Summary: Can't delete utf-8 word in ManagedStopFilterFactory
 Key: SOLR-6247
 URL: https://issues.apache.org/jira/browse/SOLR-6247
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.9
 Environment: MacOS, Solr started locally
Reporter: Patryk Maryniok


Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6247) Can't delete utf-8 word in ManagedStopFilterFactory


 [ 
https://issues.apache.org/jira/browse/SOLR-6247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patryk Maryniok updated SOLR-6247:
--

Description: 
Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug? It also happens in ManagedSynonymFilterFactory.

  was:
Request:

bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
or
bq. curl -X DELETE 
"http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";

Response:

bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 not 
found in /schema/analysis/stopwords/polish", "code":404}}

I can't delete this word, encoding doesn't affect. Am I doing something wrong 
or is it bug?


> Can't delete utf-8 word in ManagedStopFilterFactory
> ---
>
> Key: SOLR-6247
> URL: https://issues.apache.org/jira/browse/SOLR-6247
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.9
> Environment: MacOS, Solr started locally
>Reporter: Patryk Maryniok
>
> Request:
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/się";
> or
> bq. curl -X DELETE 
> "http://localhost:8983/solr/collection1/schema/analysis/stopwords/polish/si%C4%99";
> Response:
> bq. {"responseHeader":{"status":404, "QTime":3}, "error":{ "msg":"si%C4%99 
> not found in /schema/analysis/stopwords/polish", "code":404}}
> I can't delete this word, encoding doesn't affect. Am I doing something wrong 
> or is it bug? It also happens in ManagedSynonymFilterFactory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_05) - Build # 10820 - Failure!

2014-07-15 Thread Chris Hostetter


FYI: Seed reproduces for me on my 64bit linux java7...

ant test  -Dtestcase=TestFieldCacheSort 
-Dtests.method=testEmptyStringVsNullStringSort 
-Dtests.seed=A82FCB68C76B741B -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=ru -Dtests.timezone=Africa/Lusaka 
-Dtests.file.encoding=UTF-8


The failing asssert is "assert fi != null" 


  public SortedDocValues getSortedDocValues(String field) throws IOException {
SortedDocValues dv = super.getSortedDocValues(field);
FieldInfo fi = getFieldInfos().fieldInfo(field);
if (dv != null) {
  assert fi != null;
  assert fi.getDocValuesType() == FieldInfo.DocValuesType.SORTED;
  return new AssertingSortedDocValues(dv, maxDoc());
} else {
  assert fi == null || fi.getDocValuesType() != 
FieldInfo.DocValuesType.SORTED;
  return null;
}
  }





: Date: Tue, 15 Jul 2014 05:36:03 + (UTC)
: From: Policeman Jenkins Server 
: Reply-To: dev@lucene.apache.org
: To: dev@lucene.apache.org
: Subject: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_05) - Build # 10820
:  - Failure!
: 
: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10820/
: Java: 32bit/jdk1.8.0_05 -client -XX:+UseParallelGC
: 
: 1 tests failed.
: REGRESSION:  
org.apache.lucene.uninverting.TestFieldCacheSort.testEmptyStringVsNullStringSort
: 
: Error Message:
: 
: 
: Stack Trace:
: java.lang.AssertionError
:   at 
__randomizedtesting.SeedInfo.seed([A82FCB68C76B741B:C9CCF9BFBF0A72AB]:0)
:   at 
org.apache.lucene.index.AssertingAtomicReader.getSortedDocValues(AssertingAtomicReader.java:638)
:   at 
org.apache.lucene.index.FilterAtomicReader.getSortedDocValues(FilterAtomicReader.java:414)
:   at 
org.apache.lucene.index.AssertingAtomicReader.getSortedDocValues(AssertingAtomicReader.java:635)
:   at org.apache.lucene.index.DocValues.getSorted(DocValues.java:273)
:   at 
org.apache.lucene.search.FieldComparator$TermOrdValComparator.getSortedDocValues(FieldComparator.java:821)
:   at 
org.apache.lucene.search.FieldComparator$TermOrdValComparator.setNextReader(FieldComparator.java:826)
:   at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.doSetNextReader(TopFieldCollector.java:97)
:   at 
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)
:   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:605)
:   at 
org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94)
:   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:573)
:   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:525)
:   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:502)
:   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:318)
:   at 
org.apache.lucene.uninverting.TestFieldCacheSort.testEmptyStringVsNullStringSort(TestFieldCacheSort.java:1029)
:   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
:   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
:   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
:   at java.lang.reflect.Method.invoke(Method.java:483)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
:   at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
:   at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
:   at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
:   at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
:   at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
:   at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
:   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
:   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
:   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
:

[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

2014-07-15 Thread Jim Walker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062228#comment-14062228
 ] 

Jim Walker commented on SOLR-5986:
--

Steve, I wonder why you would have to restart the replica? I presume this is 
because that is your only recourse to stop a query that might take days to 
complete?

If a query takes that long and is ignoring a specified timeout, that seems like 
it's own issue that needs resolution.

IMHO, the primary goal should be to make SolrCloud clusters more resilient to 
performance degradations caused by such nasty queries described above.

The circuit-breaker approach in the linked ES tickets is clever, but it does 
not seem to be as generally applicable as the ability to view all running 
queries with an option to stop them. For example, it seems the linked ES 
circuit breaker will only trigger for issues deriving from loading too much 
field data. The problem described above may result from this cause, or any 
number of other causes.

My preference would be to have a response mechanism that 1) applies broadly and 
2) a dev-ops guy can execute in a UI like Solr Admin, or even by API.

> Don't allow runaway queries from harming Solr cluster health or search 
> performance
> --
>
> Key: SOLR-5986
> URL: https://issues.apache.org/jira/browse/SOLR-5986
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Steve Davids
>Priority: Critical
> Fix For: 4.9
>
>
> The intent of this ticket is to have all distributed search requests stop 
> wasting CPU cycles on requests that have already timed out or are so 
> complicated that they won't be able to execute. We have come across a case 
> where a nasty wildcard query within a proximity clause was causing the 
> cluster to enumerate terms for hours even though the query timeout was set to 
> minutes. This caused a noticeable slowdown within the system which made us 
> restart the replicas that happened to service that one request, the worst 
> case scenario are users with a relatively low zk timeout value will have 
> nodes start dropping from the cluster due to long GC pauses.
> [~amccurry] Built a mechanism into Apache Blur to help with the issue in 
> BLUR-142 (see commit comment for code, though look at the latest code on the 
> trunk for newer bug fixes).
> Solr should be able to either prevent these problematic queries from running 
> by some heuristic (possibly estimated size of heap usage) or be able to 
> execute a thread interrupt on all query threads once the time threshold is 
> met. This issue mirrors what others have discussed on the mailing list: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

2014-07-15 Thread Steve Davids (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062187#comment-14062187
 ] 

Steve Davids commented on SOLR-5986:


In an ideal world it would attempt to provide results for the shards that may 
be okay, but the end goal is to maintain the health of the cluster for queries 
that get out of hand. If you can know up front that there is no possible way 
that a query could complete then it would be reasonable to error out 
immediately (though that metric may be squishy to know if it will/will not 
complete). Hopefully that makes sense...

> Don't allow runaway queries from harming Solr cluster health or search 
> performance
> --
>
> Key: SOLR-5986
> URL: https://issues.apache.org/jira/browse/SOLR-5986
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Steve Davids
>Priority: Critical
> Fix For: 4.9
>
>
> The intent of this ticket is to have all distributed search requests stop 
> wasting CPU cycles on requests that have already timed out or are so 
> complicated that they won't be able to execute. We have come across a case 
> where a nasty wildcard query within a proximity clause was causing the 
> cluster to enumerate terms for hours even though the query timeout was set to 
> minutes. This caused a noticeable slowdown within the system which made us 
> restart the replicas that happened to service that one request, the worst 
> case scenario are users with a relatively low zk timeout value will have 
> nodes start dropping from the cluster due to long GC pauses.
> [~amccurry] Built a mechanism into Apache Blur to help with the issue in 
> BLUR-142 (see commit comment for code, though look at the latest code on the 
> trunk for newer bug fixes).
> Solr should be able to either prevent these problematic queries from running 
> by some heuristic (possibly estimated size of heap usage) or be able to 
> execute a thread interrupt on all query threads once the time threshold is 
> met. This issue mirrors what others have discussed on the mailing list: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5819) Add block tree postings format that supports term ords


 [ 
https://issues.apache.org/jira/browse/LUCENE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5819:
---

Attachment: LUCENE-5819.patch

New patch, fixes last nocommit, fixes ant precommit ... I think it's ready.

> Add block tree postings format that supports term ords
> --
>
> Key: LUCENE-5819
> URL: https://issues.apache.org/jira/browse/LUCENE-5819
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5819.patch, LUCENE-5819.patch
>
>
> BlockTree is our default terms dictionary today, but it doesn't
> support term ords, which is an optional API in the postings format to
> retrieve the ordinal for the currently seek'd term, and also later
> seek by that ordinal e.g. to lookup the term.
> This can possibly be useful for e.g. faceting, and maybe at some point
> we can share the postings terms dict with the one used by sorted/set
> DV for cases when app wants to invert and facet on a given field.
> The older (3.x) block terms dict can easily support ords, and we have
> a Lucene41OrdsPF in test-framework, but it's not as fast / compact as
> block-tree, and doesn't (can't easily) implement an optimized
> intersect, but it could be for fields we'd want to facet on, these
> tradeoffs don't matter.  It's nice to have options...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor


[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062157#comment-14062157
 ] 

Steve Rowe edited comment on SOLR-3881 at 7/15/14 2:58 PM:
---

bq. See 
http://language-detection.googlecode.com/svn/trunk/doc/com/cybozu/labs/langdetect/Detector.html#setMaxTextLength(int)
 - the default is 10K chars - we can pass the configured max total chars here.

-The default is actually 10K *bytes*, not chars, so we'd need to divide by two 
when passing the configured max total chars.-

*edit* Disregard the above comment; the javadocs refer to "10KB" as the default 
max text length, but 
[{{Detector.append()}}|https://code.google.com/p/language-detection/source/browse/src/com/cybozu/labs/langdetect/Detector.java#141]
 uses the {{max_text_length}} config as a max number of chars.


was (Author: steve_rowe):
bq. See 
http://language-detection.googlecode.com/svn/trunk/doc/com/cybozu/labs/langdetect/Detector.html#setMaxTextLength(int)
 - the default is 10K chars - we can pass the configured max total chars here.

The default is actually 10K *bytes*, not chars, so we'd need to divide by two 
when passing the configured max total chars.

> frequent OOM in LanguageIdentifierUpdateProcessor
> -
>
> Key: SOLR-3881
> URL: https://issues.apache.org/jira/browse/SOLR-3881
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
> Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=)
>Reporter: Rob Tulloh
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the 
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> at java.lang.StringBuffer.append(StringBuffer.java:224)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at

[jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor


[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062157#comment-14062157
 ] 

Steve Rowe commented on SOLR-3881:
--

bq. See 
http://language-detection.googlecode.com/svn/trunk/doc/com/cybozu/labs/langdetect/Detector.html#setMaxTextLength(int)
 - the default is 10K chars - we can pass the configured max total chars here.

The default is actually 10K *bytes*, not chars, so we'd need to divide by two 
when passing the configured max total chars.

> frequent OOM in LanguageIdentifierUpdateProcessor
> -
>
> Key: SOLR-3881
> URL: https://issues.apache.org/jira/browse/SOLR-3881
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
> Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=)
>Reporter: Rob Tulloh
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the 
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> at java.lang.StringBuffer.append(StringBuffer.java:224)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5473) Split clusterstate.json per collection and watch states selectively


 [ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5473:
-

Attachment: SOLR-5473-74.patch

> Split clusterstate.json per collection and watch states selectively 
> 
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, 
> ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5473) Split clusterstate.json per collection and watch states selectively


 [ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5473:
-

Attachment: (was: SOLR-5473-74.patch)

> Split clusterstate.json per collection and watch states selectively 
> 
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, 
> SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor


[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062122#comment-14062122
 ] 

Steve Rowe commented on SOLR-3881:
--

bq. Added string size calculation as string builder capacity. Used to prevent 
multiple array allocation on append. (Maybe also need to be configurable - for 
large documents only)

[~vzhovtiuk], I agree - I think we should have two configurable limits: max 
chars per field value (already in [~tomasflobbe] and your updated patches), and 
a max total chars (not there yet).

Tomás wrote:
bq. Do you think it would make more sense to limit each append (for the 
different fields) or to limit the total size of the buffer/builder (stop 
appending fields when the maximum was reached)? Both ways would prevent OOM, 
however they could give different results.

I think we should have *both* limits.

I think it's more important, though, to do as [~rcmuir] said earlier in this 
issue: 

{quote}
The langdetect implementation can append each piece at a time.

It can also take reader: append(Reader), but that is really just syntactic 
sugar forwarding to append(String)
and not exceeding the Detector.max_text_length.

Seems like the concatenating stuff should be pushed out of the base class into 
the Tika impl.
{quote}

See 
http://language-detection.googlecode.com/svn/trunk/doc/com/cybozu/labs/langdetect/Detector.html#setMaxTextLength(int)
 - the default is 10K chars - we can pass the configured max total chars here.

We should also set default maxima for both per-value and total chars, rather 
than MAX_INT, as in the current patch.

> frequent OOM in LanguageIdentifierUpdateProcessor
> -
>
> Key: SOLR-3881
> URL: https://issues.apache.org/jira/browse/SOLR-3881
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
> Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=)
>Reporter: Rob Tulloh
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the 
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> at java.lang.StringBuffer.append(StringBuffer.java:224)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.

[jira] [Comment Edited] (SOLR-5473) Split clusterstate.json per collection and watch states selectively


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062112#comment-14062112
 ] 

Noble Paul edited comment on SOLR-5473 at 7/15/14 2:27 PM:
---

bq. wouldn't say I am still convinced that caching till you fail is the same as 
watching

You are right. caching till you fail is just an optimization in CloudSolrServer 
.

According to me the client has no business to watch the state at all. The cost 
of an extra request per stale state is negligible IMHO


bq.That's why I am saying that at least in the simplistic case this should be 
left to configuration – watch none, all, or selected.

Yes, I'm inclined to add this (selective watch) as an option which kicks in 
only if the no:of collections is greater than a certain threshold (say 10) . In 
that case all Solr nodes will watch all states. 



To sum it up . My preference is

# Have SolrJ  do caching till it fails or till it times out (no watching 
whatsoever). Please enlighten me with a case where it is risky
# SolrNodes should choose to watch all or selective based on the no:of 
collections or a configurable clusterwide property




was (Author: noble.paul):
bq. wouldn't say I am still convinced that caching till you fail is the same as 
watching

You are right. caching till you fail is just an optimization in CloudSolrServer 
.

According to me the client has no business to watch the state at all. The cost 
of an extra request per stale state is negligible IMHO


bq.That's why I am saying that at least in the simplistic case this should be 
left to configuration – watch none, all, or selected.

Yes, I'm inclined to add this as an option which kicks in only if the no:of 
collections is greater than a certain threshold (say 10) . In that case all 
Solr nodes wil watch all states. 



To sum it up . My preference is

# Have SolrJ  do caching till it fails or till it times out (no watching 
whatsoever). Please enlighten me with a case where it is risky
# SolrNodes should choose to watch all or selective based on the no:of 
collections or a configurable clusterwide property



> Split clusterstate.json per collection and watch states selectively 
> 
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, 
> ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062112#comment-14062112
 ] 

Noble Paul commented on SOLR-5473:
--

bq. wouldn't say I am still convinced that caching till you fail is the same as 
watching

You are right. caching till you fail is just an optimization in CloudSolrServer 
.

According to me the client has no business to watch the state at all. The cost 
of an extra request per stale state is negligible IMHO


bq.That's why I am saying that at least in the simplistic case this should be 
left to configuration – watch none, all, or selected.

Yes, I'm inclined to add this as an option which kicks in only if the no:of 
collections is greater than a certain threshold (say 10) . In that case all 
Solr nodes wil watch all states. 



To sum it up . My preference is

# Have SolrJ  do caching till it fails or till it times out (no watching 
whatsoever). Please enlighten me with a case where it is risky
# SolrNodes should choose to watch all or selective based on the no:of 
collections or a configurable clusterwide property



> Split clusterstate.json per collection and watch states selectively 
> 
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, 
> ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5824) hunspell FLAG LONG implemented incorrectly


 [ 
https://issues.apache.org/jira/browse/LUCENE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5824.
-

   Resolution: Fixed
Fix Version/s: 4.10
   5.0

> hunspell FLAG LONG implemented incorrectly
> --
>
> Key: LUCENE-5824
> URL: https://issues.apache.org/jira/browse/LUCENE-5824
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5824.patch
>
>
> If you have more than 256 flags, you run out of 8-bit characters, so you have 
> to use another flag type to get 64k:
> * UTF-8: 16-bit BMP flags
> * long: two-character flags like 'AB'
> * num: decimal numbers like '10234'
> But our implementation for 'long' is wrong, it encodes as 'A+B', which means 
> it cant distinguish between 'AB' and 'BA' and causes overgeneration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses


[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062106#comment-14062106
 ] 

Michael McCandless commented on LUCENE-4396:


This is good progress, thanks Da!

bq. In this patch, I just compact the array as I go through the MUST_NOT docs.

It looks like this gave some nice gains with the many-not cases

bq. It seems that we can get a better result on *Some* tasks if we combine 
size9 with size5.

Curiously some of the tasks are really hurt by the larger sizes ... maybe 1<<9 
is a good compromise?

> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> SIZE.perf, luceneutil-score-equal.patch, luceneutil-score-equal.patch, 
> stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5473) Split clusterstate.json per collection and watch states selectively


 [ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5473:
-

Attachment: SOLR-5473-74.patch

Full patch with all tests passing .

The changes are minimal with an addition of a constructor to ClusterState

> Split clusterstate.json per collection and watch states selectively 
> 
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, 
> ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6241) HttpPartitionTest.testRf3WithLeaderFailover fails sometimes

2014-07-15 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062064#comment-14062064
 ] 

Shalin Shekhar Mangar commented on SOLR-6241:
-

I still see some exceptions such as:
{code}
No registered leader was found after waiting for 6ms , collection: 
c8n_1x3_lf slice: shard1
Stacktrace

org.apache.solr.common.SolrException: No registered leader was found after 
waiting for 6ms , collection: c8n_1x3_lf slice: shard1
at 
__randomizedtesting.SeedInfo.seed([CBCC4F6420498B0C:4A2AC17C5716EB30]:0)
at 
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:567)
at 
org.apache.solr.cloud.HttpPartitionTest.testRf3WithLeaderFailover(HttpPartitionTest.java:370)
at 
org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:150)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
{code}

> HttpPartitionTest.testRf3WithLeaderFailover fails sometimes
> ---
>
> Key: SOLR-6241
> URL: https://issues.apache.org/jira/browse/SOLR-6241
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud, Tests
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 4.10
>
>
> This test fails sometimes locally as well as on jenkins.
> {code}
> Expected 2 of 3 replicas to be active but only found 1
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.solr.cloud.HttpPartitionTest.testRf3WithLeaderFailover(HttpPartitionTest.java:367)
> at 
> org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:148)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6245) Socket and Connection configuration are ignored in HttpSolrServer when passing in HttpClient

2014-07-15 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-6245.
-

   Resolution: Fixed
Fix Version/s: 4.10
   5.0

Thanks Patanachai!

> Socket and Connection configuration are ignored in HttpSolrServer when 
> passing in HttpClient
> 
>
> Key: SOLR-6245
> URL: https://issues.apache.org/jira/browse/SOLR-6245
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 4.7, 4.8, 4.9
>Reporter: Patanachai Tangchaisin
>Assignee: Shalin Shekhar Mangar
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6245.patch, SOLR-6245.patch, SOLR-6245.patch
>
>
> I spent time debugging our HttpSolrServer and HttpClient. We construct our 
> HttpClient (we have some requirement regarding about connectionTimeout, 
> soTimeout, etc.) and then pass it to HttpSolrServer. I found out that all our 
> socket level and connection level configuration are ignored when creating a 
> http connection. 
> The problem is in HttpClient 4.3.X, they allow overriding of these parameters 
> per request i.e. one request can have socketTimeout=100ms and another request 
> can have socketTimeout=200ms. The logic[1] to check whether to make it 
> per-request base config or not depending on whether any of these parameters 
> is set. 
> {code}
>  protected NamedList executeMethod(HttpRequestBase method, final 
> ResponseParser processor) throws SolrServerException {
> // XXX client already has this set, is this needed?
> method.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS,
> followRedirects);
> method.addHeader("User-Agent", AGENT);
> {code}
> In HttpSolrServer.java, only one parameter (HANDLE_REDIRECTS) is set but that 
> trigger the logic in HttpClient to initialize a default per-request base 
> config, which eventually override any socket and connection configuration, we 
> did via HttpClientBuilder.
> To conclude, a solution would be to remove these line
> {code}
> // XXX client already has this set, is this needed?
> method.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS,
> followRedirects);
> {code}
> [1] - 
> http://svn.apache.org/viewvc/httpcomponents/httpclient/trunk/httpclient/src/main/java/org/apache/http/impl/client/InternalHttpClient.java?revision=1603745&view=markup
>  [LINE:172]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

NumericFacets assumptions

2014-07-15 Thread Michael Ryan

I'm looking at org.apache.solr.request.NumericFacets.getCounts(), trying to get 
an idea of how it works. Specifically looking at this bit of code:

final List leaves = searcher.getIndexReader().leaves();
final Iterator ctxIt = leaves.iterator();
for (DocIterator docsIt = docs.iterator(); docsIt.hasNext(); ) {
  final int doc = docsIt.nextDoc();
  if (ctx == null || doc >= ctx.docBase + ctx.reader().maxDoc()) {
do {
  ctx = ctxIt.next();
} while (ctx == null || doc >= ctx.docBase + ctx.reader().maxDoc());
switch (numericType) {
  case LONG:
longs = DocValues.getNumeric(ctx.reader(), fieldName);

Am I right that it is assuming that the docs are in order? This confused me 
because the javadoc for DocIterator says "The order of the documents returned 
by this iterator is non-deterministic". For most of the DocIterator 
implementations, the docs are returned in order, which I guess is how this 
works?

Also, am I right that it is assuming the index reader leaves are in order? That 
is, each leaf has higher doc ids than the previous leaf? I'm wondering too if 
that is always true, or if that is just how the implementation works now.

(I'm asking these questions because I want to do something very similar in a 
custom bit of code I'm writing, and would be nice if I could safely make these 
same assumptions.)

-Michael

[jira] [Updated] (SOLR-6245) Socket and Connection configuration are ignored in HttpSolrServer when passing in HttpClient

2014-07-15 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6245:


Attachment: SOLR-6245.patch

Moved the testcase into a new test so that we can use SupressSSL annotation on 
it. I cannot reconcile the SSL configuration with the new HttpComponents API 
and I don't have time to work on it further. But coverage wise we're good 
because the fix is adequately tested.

> Socket and Connection configuration are ignored in HttpSolrServer when 
> passing in HttpClient
> 
>
> Key: SOLR-6245
> URL: https://issues.apache.org/jira/browse/SOLR-6245
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 4.7, 4.8, 4.9
>Reporter: Patanachai Tangchaisin
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-6245.patch, SOLR-6245.patch, SOLR-6245.patch
>
>
> I spent time debugging our HttpSolrServer and HttpClient. We construct our 
> HttpClient (we have some requirement regarding about connectionTimeout, 
> soTimeout, etc.) and then pass it to HttpSolrServer. I found out that all our 
> socket level and connection level configuration are ignored when creating a 
> http connection. 
> The problem is in HttpClient 4.3.X, they allow overriding of these parameters 
> per request i.e. one request can have socketTimeout=100ms and another request 
> can have socketTimeout=200ms. The logic[1] to check whether to make it 
> per-request base config or not depending on whether any of these parameters 
> is set. 
> {code}
>  protected NamedList executeMethod(HttpRequestBase method, final 
> ResponseParser processor) throws SolrServerException {
> // XXX client already has this set, is this needed?
> method.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS,
> followRedirects);
> method.addHeader("User-Agent", AGENT);
> {code}
> In HttpSolrServer.java, only one parameter (HANDLE_REDIRECTS) is set but that 
> trigger the logic in HttpClient to initialize a default per-request base 
> config, which eventually override any socket and connection configuration, we 
> did via HttpClientBuilder.
> To conclude, a solution would be to remove these line
> {code}
> // XXX client already has this set, is this needed?
> method.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS,
> followRedirects);
> {code}
> [1] - 
> http://svn.apache.org/viewvc/httpcomponents/httpclient/trunk/httpclient/src/main/java/org/apache/http/impl/client/InternalHttpClient.java?revision=1603745&view=markup
>  [LINE:172]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5819) Add block tree postings format that supports term ords


[ 
https://issues.apache.org/jira/browse/LUCENE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062020#comment-14062020
 ] 

Michael McCandless commented on LUCENE-5819:


I ran a quick perf test of Lucene41 vs OrdsLucene41, on wikimediumall:
{noformat}
Report after iter 19:
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
PKLookup  153.33  (8.7%)  131.17  (8.5%)  
-14.4% ( -29% -3%)
 Respell   35.40  (5.4%)   31.41  (7.9%)  
-11.3% ( -23% -2%)
  AndHighLow  241.05  (3.3%)  224.00 (14.7%)   
-7.1% ( -24% -   11%)
  Fuzzy2   69.73  (6.3%)   65.30  (5.5%)   
-6.3% ( -17% -5%)
  Fuzzy1   44.32  (9.4%)   41.90 (11.8%)   
-5.5% ( -24% -   17%)
 LowTerm  313.68  (2.4%)  296.93 (10.8%)   
-5.3% ( -18% -8%)
Wildcard   39.40  (5.7%)   37.35  (9.7%)   
-5.2% ( -19% -   10%)
  IntNRQ3.57  (9.3%)3.41 (14.5%)   
-4.6% ( -26% -   21%)
 MedSloppyPhrase4.98  (3.3%)4.76 (12.7%)   
-4.4% ( -19% -   12%)
   MedPhrase6.18  (3.8%)5.95 (13.1%)   
-3.7% ( -19% -   13%)
HighTerm   27.78  (5.8%)   26.75 (10.1%)   
-3.7% ( -18% -   12%)
 AndHighHigh   13.51  (2.0%)   13.02  (9.9%)   
-3.6% ( -15% -8%)
 LowSloppyPhrase  134.71  (3.3%)  130.50 (12.1%)   
-3.1% ( -17% -   12%)
 Prefix38.88  (9.7%)8.65 (15.6%)   
-2.7% ( -25% -   25%)
   LowPhrase   49.67  (3.1%)   48.38 (11.4%)   
-2.6% ( -16% -   12%)
 MedTerm  117.97  (4.5%)  115.01  (6.9%)   
-2.5% ( -13% -9%)
  HighPhrase7.87  (6.0%)7.73 (13.3%)   
-1.8% ( -19% -   18%)
HighSpanNear4.68  (6.6%)4.61 (14.7%)   
-1.4% ( -21% -   21%)
  AndHighMed   49.48  (1.6%)   48.95  (5.0%)   
-1.1% (  -7% -5%)
 LowSpanNear   23.70  (4.6%)   23.55 (10.4%)   
-0.7% ( -14% -   15%)
HighSloppyPhrase5.90  (4.4%)5.87 (11.2%)   
-0.5% ( -15% -   15%)
OrNotHighLow   36.90 (12.3%)   37.07 (12.9%)
0.5% ( -22% -   29%)
  OrHighHigh4.16 (15.2%)4.19 (16.7%)
0.8% ( -27% -   38%)
   OrHighNotHigh   11.86 (13.8%)   11.98 (18.4%)
0.9% ( -27% -   38%)
 MedSpanNear4.32  (5.3%)4.39 (10.7%)
1.5% ( -13% -   18%)
OrHighNotMed   26.10 (14.7%)   26.60 (12.8%)
1.9% ( -22% -   34%)
OrHighNotLow   19.61 (15.8%)   20.08 (13.9%)
2.4% ( -23% -   38%)
OrNotHighMed   13.84 (15.9%)   14.19 (16.7%)
2.6% ( -25% -   41%)
   OrHighMed   27.09 (18.5%)   27.87 (19.4%)
2.9% ( -29% -   50%)
   OrHighLow   36.24 (15.4%)   37.42 (15.3%)
3.2% ( -23% -   40%)
   OrNotHighHigh9.70 (16.6%)   10.11 (15.5%)
4.2% ( -23% -   43%)
{noformat}

Net/net the terms-dict heavy operations (PKLookup, respell, fuzzy,
maybe IntNRQ) take some hit, since there is added cost to decode
ordinals from the FST; I think the other changes are likely noise.

Also, the net terms index (size of FSTs that are loaded into RAM,
\*.tip/\*.tipo) grew from 31M to 46M (~48% larger)...


> Add block tree postings format that supports term ords
> --
>
> Key: LUCENE-5819
> URL: https://issues.apache.org/jira/browse/LUCENE-5819
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5819.patch
>
>
> BlockTree is our default terms dictionary today, but it doesn't
> support term ords, which is an optional API in the postings format to
> retrieve the ordinal for the currently seek'd term, and also later
> seek by that ordinal e.g. to lookup the term.
> This can possibly be useful for e.g. faceting, and maybe at some point
> we can share the postings terms dict with the one used by sorted/set
> DV for cases when app wants to invert and facet on a given field.
> The older (3.x) block terms dict can easily support ords, and we have
> a Lucene41OrdsPF in test-framework, but it's not as fast / compact as
> block-tree, and doesn't (can't easily) implement

[JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 2031 - Still Failing

2014-07-15 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/2031/

1 tests failed.
REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains

Error Message:
startOffset must be non-negative, and endOffset must be >= startOffset, 
startOffset=2,endOffset=1

Stack Trace:
java.lang.IllegalArgumentException: startOffset must be non-negative, and 
endOffset must be >= startOffset, startOffset=2,endOffset=1
at 
__randomizedtesting.SeedInfo.seed([2871E466A5044906:1590CD07E21654C6]:0)
at 
org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl.setOffset(PackedTokenAttributeImpl.java:107)
at 
org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345)
at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:703)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:614)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:513)
at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:927)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuit

[jira] [Updated] (LUCENE-5824) hunspell FLAG LONG implemented incorrectly


 [ 
https://issues.apache.org/jira/browse/LUCENE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5824:


Attachment: LUCENE-5824.patch

Simple patch and test to encode as A << 8 + B (and also check the values are 
really within range: they should be two ascii characters). 

This bug currently impacts the more complicated dictionaries using this 
encoding type (russian, arabic, hebrew, etc)

> hunspell FLAG LONG implemented incorrectly
> --
>
> Key: LUCENE-5824
> URL: https://issues.apache.org/jira/browse/LUCENE-5824
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5824.patch
>
>
> If you have more than 256 flags, you run out of 8-bit characters, so you have 
> to use another flag type to get 64k:
> * UTF-8: 16-bit BMP flags
> * long: two-character flags like 'AB'
> * num: decimal numbers like '10234'
> But our implementation for 'long' is wrong, it encodes as 'A+B', which means 
> it cant distinguish between 'AB' and 'BA' and causes overgeneration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5824) hunspell FLAG LONG implemented incorrectly

Robert Muir created LUCENE-5824:
---

 Summary: hunspell FLAG LONG implemented incorrectly
 Key: LUCENE-5824
 URL: https://issues.apache.org/jira/browse/LUCENE-5824
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


If you have more than 256 flags, you run out of 8-bit characters, so you have 
to use another flag type to get 64k:
* UTF-8: 16-bit BMP flags
* long: two-character flags like 'AB'
* num: decimal numbers like '10234'

But our implementation for 'long' is wrong, it encodes as 'A+B', which means it 
cant distinguish between 'AB' and 'BA' and causes overgeneration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-07-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061967#comment-14061967
 ] 

ASF GitHub Bot commented on SOLR-5865:
--

Github user asfgit closed the pull request at:

https://github.com/apache/camel/pull/218


> Provide a MiniSolrCloudCluster to enable easier testing
> ---
>
> Key: SOLR-5865
> URL: https://issues.apache.org/jira/browse/SOLR-5865
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.7, 5.0
>Reporter: Gregory Chanan
>Assignee: Mark Miller
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5865.patch, SOLR-5865.patch, 
> SOLR-5865addendum.patch, SOLR-5865addendum2.patch, SOLR-5865wait.patch
>
>
> Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
> which has a couple of issues around support for downstream projects:
> - It's difficult to test SolrCloud support in a downstream project that may 
> have its own test framework.  For example, some projects have support for 
> different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
> against each of the different backends.  This is difficult to do cleanly, 
> because the Solr tests require derivation from LuceneTestCase, while the 
> other don't
> - The LuceneTestCase class hierarchy is really designed for internal solr 
> tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
> downstream project probably doesn't care about that).  It's also quite 
> complicated and dense, much more so than a downstream project would want.
> Given these reasons, it would be nice to provide a simple 
> "MiniSolrCloudCluster", similar to how HDFS provides a MiniHdfsCluster or 
> HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5746) solr.xml parsing of "str" vs "int" vs "bool" is brittle; fails silently; expects odd type for "shareSchema"

2014-07-15 Thread Maciej Zasada (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060685#comment-14060685
 ] 

Maciej Zasada edited comment on SOLR-5746 at 7/15/14 11:08 AM:
---

Hi [~hossman],

I've attached updated patch file:
* removed {{DOMUtil.readNamedChildrenAsNamedList}} method and used (slightly 
modified) existing API of {{DOMUtil}} instead.
* removed reading values from {{SolrParam}} - they are being read directly from 
the {{NamedList<>}}
* added reporting of duplicated config options ({{DEBUG}} level) per parent 
node, as well as exception message containing list of duplicated parameters, 
e.g.

{code}

1
STRING-1
…
2
STRING-2


{code}
will cause an exception:
{code}
Duplicated 2 config parameter(s) in solr.xml file: [int-param, str-param]
{code}
However, if parameters with a same name are attached to different parent nodes 
everything will pass just fine, e.g.
{code}

1
STRING-1
…


…
2
STRING-2


{code}
In this case no exception will be thrown. 

Some examples to sum it up:

|{{solr.xml}} file fragment|Expected type|Parsing result|
|{{44}}|{{Integer}}|(/)
|{{44}}|{{Integer}}|(/)
|{{44}}|{{Integer}}|(x)
|{{44}}|{{Integer}}|(x)
|{{44}}|{{Integer}}|(x)
|{{44}}|{{Integer}}|(x)
|{{true}}|{{Boolean}}|(x)
|{{true}}|{{Boolean}}|(/)
|{{true}}|{{Boolean}}|(x)
|{{true}}|{{Boolean}}|(/)
|{{true}}|{{Boolean}}|(x)
|{{true}}|{{Boolean}}|(x)

{{ant clean test}} shows that there's no regression.

[~jkrupan] this change clearly is not backward compatible with the existing 
{{solr.xml}} files. For instance - unknown config values won't be silently 
ignored - an exception will be thrown instead. However, I didn't realise that 
{{solr.xml}} files are versioned the same way as {{schema.xml}} files are. 
Should I bump the schema version to 1.6?

Cheers,
Maciej


was (Author: maciej.zasada):
Hi [~hossman],

I've attached updated patch file:
* removed {{DOMUtil.readNamedChildrenAsNamedList}} method and used (slightly 
modified) existing API of {{DOMUtil}} instead.
* removed reading values from {{SolrParam}} - they are being read directly from 
the {{NamedList<>}}
* added reporting of duplicated config options ({{DEBUG}} level) per parent 
node, as well as exception message containing list of duplicated parameters, 
e.g.

{code}

1
STRING-1
…
2
STRING-2


{code}
will cause an exception:
{code}
Duplicated 2 config parameter(s) in solr.xml file: [int-param, str-param]
{code}
However, if parameters with a same name are attached to different parent nodes 
everything will pass just fine, e.g.
{code}

1
STRING-1
…


…
2
STRING-2


{code}
In this case no exception will be thrown. 

Some examples to sum it up:

|{{solr.xml}} file fragment|Expected type|Parsing result|
|{{44}}|{{Integer}}|(/)
|{{44}}|{{Integer}}|(/)
|{{44}}|{{Integer}}|(/)
|{{44}}|{{Integer}}|(x)
|{{44}}|{{Integer}}|(x)
|{{44}}|{{Integer}}|(x)
|{{true}}|{{Boolean}}|(x)
|{{true}}|{{Boolean}}|(/)
|{{true}}|{{Boolean}}|(x)
|{{true}}|{{Boolean}}|(/)
|{{true}}|{{Boolean}}|(x)
|{{true}}|{{Boolean}}|(x)

{{ant clean test}} shows that there's no regression.

[~jkrupan] this change clearly is not backward compatible with the existing 
{{solr.xml}} files. For instance - unknown config values won't be silently 
ignored - an exception will be thrown instead. However, I didn't realise that 
{{solr.xml}} files are versioned the same way as {{schema.xml}} files are. 
Should I bump the schema version to 1.6?

Cheers,
Maciej

> solr.xml parsing of "str" vs "int" vs "bool" is brittle; fails silently; 
> expects odd type for "shareSchema"   
> --
>
> Key: SOLR-5746
> URL: https://issues.apache.org/jira/browse/SOLR-5746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5, 4.6
>Reporter: Hoss Man
> Attachments: SOLR-5746.patch, SOLR-5746.patch, SOLR-5746.patch, 
> SOLR-5746.patch
>
>
> A comment in the ref guide got me looking at ConfigSolrXml.java and noticing 
> that the parsing of solr.xml options here is very brittle and confusing.  In 
> particular:
> * if a boolean option "foo" is expected along the lines of {{ name="foo">true}} it will silently ignore {{ name="foo">true}}
> * likewise for an int option {{32}} vs {{ name="bar">32}}
> ... this is inconsistent with the way solrconfig.xml is parsed.  In 
> solrconfig.xml, the xml nodes are parsed into a NamedList, and the above 
> options will work in either form, but an invalid value such as {{ name="foo">NOT A BOOLEAN}} will generate an error earlier (when 
> parsing config) then {{NOT A BOOLEAN}} (attempt to 
> parse the string as a bool the first time the config value is needed)
> In addition, i notice this really confusing line...
> {code}
> propMap.put(CfgProp.SOLR_SHARESCHEMA, 
> doSub("solr/str[@na

[jira] [Updated] (SOLR-5746) solr.xml parsing of "str" vs "int" vs "bool" is brittle; fails silently; expects odd type for "shareSchema"

2014-07-15 Thread Maciej Zasada (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Zasada updated SOLR-5746:


Attachment: SOLR-5746.patch

Added some more unit test to the patch.

> solr.xml parsing of "str" vs "int" vs "bool" is brittle; fails silently; 
> expects odd type for "shareSchema"   
> --
>
> Key: SOLR-5746
> URL: https://issues.apache.org/jira/browse/SOLR-5746
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5, 4.6
>Reporter: Hoss Man
> Attachments: SOLR-5746.patch, SOLR-5746.patch, SOLR-5746.patch, 
> SOLR-5746.patch
>
>
> A comment in the ref guide got me looking at ConfigSolrXml.java and noticing 
> that the parsing of solr.xml options here is very brittle and confusing.  In 
> particular:
> * if a boolean option "foo" is expected along the lines of {{ name="foo">true}} it will silently ignore {{ name="foo">true}}
> * likewise for an int option {{32}} vs {{ name="bar">32}}
> ... this is inconsistent with the way solrconfig.xml is parsed.  In 
> solrconfig.xml, the xml nodes are parsed into a NamedList, and the above 
> options will work in either form, but an invalid value such as {{ name="foo">NOT A BOOLEAN}} will generate an error earlier (when 
> parsing config) then {{NOT A BOOLEAN}} (attempt to 
> parse the string as a bool the first time the config value is needed)
> In addition, i notice this really confusing line...
> {code}
> propMap.put(CfgProp.SOLR_SHARESCHEMA, 
> doSub("solr/str[@name='shareSchema']"));
> {code}
> "shareSchema" is used internally as a boolean option, but as written the 
> parsing code will ignore it unless the user explicitly configures it as a 
> {{}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5823) recognize hunspell FULLSTRIP option in affix file


 [ 
https://issues.apache.org/jira/browse/LUCENE-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5823.
-

   Resolution: Fixed
Fix Version/s: 4.10
   5.0

> recognize hunspell FULLSTRIP option in affix file
> -
>
> Key: LUCENE-5823
> URL: https://issues.apache.org/jira/browse/LUCENE-5823
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5823.patch
>
>
> With LUCENE-5818 we fixed stripping to be correct (ensuring it doesnt strip 
> the entire word before applying an affix). This is usually true, but there is 
> an option in the affix file to allow this.
> Its used by several languages (french, latvian, swedish, etc)
> {noformat}
> FULLSTRIP
>   With FULLSTRIP, affix rules can strip full words, not  only  one
>   less characters, before adding the affixes
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5823) recognize hunspell FULLSTRIP option in affix file

Robert Muir created LUCENE-5823:
---

 Summary: recognize hunspell FULLSTRIP option in affix file
 Key: LUCENE-5823
 URL: https://issues.apache.org/jira/browse/LUCENE-5823
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5823.patch

With LUCENE-5818 we fixed stripping to be correct (ensuring it doesnt strip the 
entire word before applying an affix). This is usually true, but there is an 
option in the affix file to allow this.

Its used by several languages (french, latvian, swedish, etc)

{noformat}
FULLSTRIP
  With FULLSTRIP, affix rules can strip full words, not  only  one
  less characters, before adding the affixes
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5823) recognize hunspell FULLSTRIP option in affix file