[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 795 - Failure!

2013-09-04 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/795/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 10286 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=79AF0BD36D5B1D8 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=ISO-8859-1 
-classpath 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.0.10.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.7.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-fileupload-1.2.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-lang-2.6.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/concurrentlinkedhashmap-lru-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/dom4j-1.6.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/guava-14.0.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-annotations-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-auth-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-common-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/hadoop-hdfs-2.0.5-alpha.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/joda-tim

[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-09-04 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757536#comment-13757536
 ] 

Tommaso Teofili commented on SOLR-5201:
---

here's a draft patch: 
https://github.com/tteofili/lucene-solr/compare/apache:trunk...solr-5201.patch

_AnalysisEngines_ are initialized inside _UIMAUpdateRequestProcessorFactories_ 
together with a _JCasPool_ to better handle multiple concurrent requests.
My benchmarks (ran 'ant clean test -Dtests.multiplier=100' with and without the 
above patch) show execution of 
_UIMAUpdateRequestProcessorTest#testMultiplierProcessing_ is ~10 times faster 
and less memory consumptive (~240MB saved over ~650MB heap)

> UIMAUpdateRequestProcessor should reuse the AnalysisEngine
> --
>
> Key: SOLR-5201
> URL: https://issues.apache.org/jira/browse/SOLR-5201
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - UIMA
>Affects Versions: 4.4
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5201-ae-cache-every-request_branch_4x.patch, 
> SOLR-5201-ae-cache-only-single-request_branch_4x.patch
>
>
> As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
> UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
> which is bad for performance therefore it'd be nice if such AEs could be 
> reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-2649) MM ignored in edismax queries with operators

2013-09-04 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-2649:


Comment: was deleted

(was: Thanks for the clarification. )

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Priority: Minor
> Fix For: 4.5, 5.0
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-2649) MM ignored in edismax queries with operators

2013-09-04 Thread Thomas Egense (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Egense updated SOLR-2649:


Comment: was deleted

(was: Thank you! We have been waiting a long time for this fix.)

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Priority: Minor
> Fix For: 4.5, 5.0
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors

2013-09-04 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757570#comment-13757570
 ] 

Adrien Grand commented on LUCENE-5188:
--

I will commit later today if there is no objection.

> Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
> ---
>
> Key: LUCENE-5188
> URL: https://issues.apache.org/jira/browse/LUCENE-5188
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5188.patch
>
>
> The way CompressingStoredFieldsFormat works is that it first decompresses 
> data and then consults the StoredFieldVisitor. This is a bit wasteful in case 
> documents are big and only the first field of a document is of interest so 
> maybe we could decompress and consult the StoredFieldVicitor in a more 
> streaming fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4818) Create a boolean perceptron classifier

2013-09-04 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili resolved LUCENE-4818.
-

Resolution: Fixed

marking it as resolved, future improvements would come in separate issues.

> Create a boolean perceptron classifier
> --
>
> Key: LUCENE-4818
> URL: https://issues.apache.org/jira/browse/LUCENE-4818
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/classification
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-4818.patch
>
>
> Create a Lucene based classifier using the perceptron algorithm (see 
> http://en.wikipedia.org/wiki/Perceptron)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5213) collections?action=SPLITSHARD parent vs. sub-shards numDocs

2013-09-04 Thread Christine Poerschke (JIRA)
Christine Poerschke created SOLR-5213:
-

 Summary: collections?action=SPLITSHARD parent vs. sub-shards 
numDocs
 Key: SOLR-5213
 URL: https://issues.apache.org/jira/browse/SOLR-5213
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.4
Reporter: Christine Poerschke


The problem we saw was that splitting a shard took a long time and at the end 
of it the sub-shards contained fewer documents than the original shard.

The root cause was eventually tracked down to the disappearing documents not 
falling into the hash ranges of the sub-shards.

Could SolrIndexSplitter split report per-segment numDocs for parent and 
sub-shards, with at least a warning logged for any discrepancies (documents 
falling into none of the sub-shards or documents falling into several 
sub-shards)?

Additionally, could a case be made for erroring out when discrepancies are 
detected i.e. not proceeding with the shard split? Either to always error or to 
have an verifyNumDocs=false/true optional parameter for the SPLITSHARD action.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5213) collections?action=SPLITSHARD parent vs. sub-shards numDocs

2013-09-04 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-5213:
--

Attachment: SOLR-5213.patch

Attaching patch for reporting per-segment numDocs for parent and sub-shards.

> collections?action=SPLITSHARD parent vs. sub-shards numDocs
> ---
>
> Key: SOLR-5213
> URL: https://issues.apache.org/jira/browse/SOLR-5213
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.4
>Reporter: Christine Poerschke
> Attachments: SOLR-5213.patch
>
>
> The problem we saw was that splitting a shard took a long time and at the end 
> of it the sub-shards contained fewer documents than the original shard.
> The root cause was eventually tracked down to the disappearing documents not 
> falling into the hash ranges of the sub-shards.
> Could SolrIndexSplitter split report per-segment numDocs for parent and 
> sub-shards, with at least a warning logged for any discrepancies (documents 
> falling into none of the sub-shards or documents falling into several 
> sub-shards)?
> Additionally, could a case be made for erroring out when discrepancies are 
> detected i.e. not proceeding with the shard split? Either to always error or 
> to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD 
> action.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors

2013-09-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757635#comment-13757635
 ] 

Simon Willnauer commented on LUCENE-5188:
-

cool stuff adrien!
One thing I wonder is if we should use a specialized DataInput maybe 
SkippableDataInput in that class to prevent the static method. That shared byte 
array worries me. Aside of this, I wonder if we had this method in DataInput or 
however we gonna do this would it be possible to skip an entire decompression 
step if we know that the amount of bytes we skip is larger than one or more 
decompression blocks. I have to admit I don't exactly know how this works and 
if what I propose is possible but that would help me to better understand why 
we need to read all the data and decompress if we trash it anyway.

> Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
> ---
>
> Key: LUCENE-5188
> URL: https://issues.apache.org/jira/browse/LUCENE-5188
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5188.patch
>
>
> The way CompressingStoredFieldsFormat works is that it first decompresses 
> data and then consults the StoredFieldVisitor. This is a bit wasteful in case 
> documents are big and only the first field of a document is of interest so 
> maybe we could decompress and consult the StoredFieldVicitor in a more 
> streaming fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors

2013-09-04 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757641#comment-13757641
 ] 

Adrien Grand commented on LUCENE-5188:
--

These bytes can be shared because they are write-only, kind of like /dev/null. 
Having this on DataInput to be able to skip an entire decompression would be 
nice but unfortunately with the current design, the field numbers are stored in 
the compressed stream, so you need to decompress anyway to know whether you 
should skip (StoredFieldVisitor allows to skip based on the FieldInfo, that my 
StoredFieldReader computes from the field number). But your idea is something I 
would like to explore for the next StoredFieldsFormat, along with preset 
dictionaries.

> Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
> ---
>
> Key: LUCENE-5188
> URL: https://issues.apache.org/jira/browse/LUCENE-5188
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5188.patch
>
>
> The way CompressingStoredFieldsFormat works is that it first decompresses 
> data and then consults the StoredFieldVisitor. This is a bit wasteful in case 
> documents are big and only the first field of a document is of interest so 
> maybe we could decompress and consult the StoredFieldVicitor in a more 
> streaming fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757665#comment-13757665
 ] 

Michael McCandless commented on LUCENE-5197:


+1 to current patch, except SimpleTextFieldsReader does in fact use RAM (it has 
a termsCache, and it sneakily pre-loads all terms for each field into an FST!).

I think we should go with this current approach, and then later, if/when we 
improve RUE to easily restrict where it crawls / speed it up / etc., then we 
can cutover.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors

2013-09-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757666#comment-13757666
 ] 

Simon Willnauer commented on LUCENE-5188:
-

thanks adrien for elaborating... progress over perfection so lets move on here. 
+1 to commit

> Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
> ---
>
> Key: LUCENE-5188
> URL: https://issues.apache.org/jira/browse/LUCENE-5188
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5188.patch
>
>
> The way CompressingStoredFieldsFormat works is that it first decompresses 
> data and then consults the StoredFieldVisitor. This is a bit wasteful in case 
> documents are big and only the first field of a document is of interest so 
> maybe we could decompress and consult the StoredFieldVicitor in a more 
> streaming fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757668#comment-13757668
 ] 

Michael McCandless commented on LUCENE-3069:


Thanks for uploading the diffs against trunk, Han; I'll review this.

Can you explain the two new terms dict impls?  And maybe write up a brief 
summary of all the changes (to help others understand the patch)?

Maybe we can put the new "all in memory" terms dict impls under 
oal.codecs.memory?  FSTTerms* seems like a good name?  (Just because in the 
future maybe we have other impls of "all in memory" terms dicts)...

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-04 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757676#comment-13757676
 ] 

Han Jiang commented on LUCENE-3069:
---

OK! These two term dicts are both FST-based:

* FST term dict directly uses FST to map term to its metadata & stats 
(FST)
* FSTOrd term dict uses FST to map term to its ordinal number (FST), and 
the ordinal is then used to seek metadata from another big chunk.

I prefer the second impl since it puts much less stress on FST.

I have updated the detailed format explaination in last commit. Hmm, I'll 
create another patch for this...

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757674#comment-13757674
 ] 

Michael McCandless commented on LUCENE-5189:


We could simply document this as a limitation, today?  Ie, that if it's an 
update, the DVFormat cannot use the attributes APIs.  This would let us proceed 
(progress not perfection) and then later, we address it.  Ie, I think the added 
boolean is a fair compromise.

Or, we can pursue gen'ing FIS on this patch, but this is going to add a lot of 
trickiness/complexity; I think it'd be better to explore it separately.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757681#comment-13757681
 ] 

Erick Erickson commented on SOLR-2548:
--

[~hossman_luc...@fucit.org] Thanks. Your comments made me look more carefully 
at directExecutor, it took me a bit to wrap my head around that one.

1> Still checking on the implications of stacking up a bunch of directExecutors 
all through the CompletionService, not something I've used recently and the 
details are hazy.

As far as tests are concerned, I haven't gotten there yet, the original patch 
didn't have any... It should be easy to create tests with multiple field.facet 
clauses, TestFaceting does this so there are templates. Is there a decent way 
to check whether more than one thread was actually spawned? If so, can you 
point me at some code that actually does that? Otherwise I'll create tests that 
just get the right response for single and multiple facet.field specifications 
and a bit of walk-through with the debugger to insure we actually go through 
that code path.

2> done.

Thanks again.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757688#comment-13757688
 ] 

Shai Erera commented on LUCENE-5189:


I think it's important to solve FIS.gen, either on this issue or a separate 
one, but before 4.5 is out. Because now SegmentInfos records per-field dvGen 
and if we gen FIS, this will be recorded by a new Lucene45FieldInfosFormat, and 
SIS will need to record fieldInfosGen. I actually don't mind to do it in this 
issue. It's work that's needed and affects NDV-updates (e.g. sparse fields 
which now hit a too late cryptic exception).

But I also don't mind moving forward with SWS.isFieldUpdate and remove it in a 
follow on issue ... as long as it's done before 4.5.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-SmokeRelease-4.x - Build # 105 - Still Failing

2013-09-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/105/

No tests ran.

Build Log:
[...truncated 34200 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease
 [copy] Copying 416 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/lucene
 [copy] Copying 194 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr
 [exec] JAVA6_HOME is /home/hudson/tools/java/latest1.6
 [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7
 [exec] NOTE: output encoding is US-ASCII
 [exec] 
 [exec] Load release URL 
"file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/"...
 [exec] 
 [exec] Test Lucene...
 [exec]   test basics...
 [exec]   get KEYS
 [exec] 0.1 MB in 0.01 sec (11.0 MB/sec)
 [exec]   check changes HTML...
 [exec]   download lucene-4.5.0-src.tgz...
 [exec] 27.1 MB in 0.04 sec (605.5 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.5.0.tgz...
 [exec] 49.0 MB in 0.07 sec (660.3 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.5.0.zip...
 [exec] 58.8 MB in 0.12 sec (509.9 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   unpack lucene-4.5.0.tgz...
 [exec] verify JAR/WAR metadata...
 [exec] test demo with 1.6...
 [exec]   got 5717 hits for query "lucene"
 [exec] test demo with 1.7...
 [exec]   got 5717 hits for query "lucene"
 [exec] check Lucene's javadoc JAR
 [exec] 
 [exec] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeReleaseTmp/unpack/lucene-4.5.0/docs/core/org/apache/lucene/util/AttributeSource.html
 [exec]   broken details HTML: Method Detail: addAttributeImpl: closing 
"" does not match opening ""
 [exec]   broken details HTML: Method Detail: getAttribute: closing 
"" does not match opening ""
 [exec] Traceback (most recent call last):
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 1450, in 
 [exec] main()
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 1394, in main
 [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, 
testArgs)
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 1431, in smokeTest
 [exec] unpackAndVerify('lucene', tmpDir, artifact, svnRevision, 
version, testArgs)
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 607, in unpackAndVerify
 [exec] verifyUnpacked(project, artifact, unpackPath, svnRevision, 
version, testArgs)
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 786, in verifyUnpacked
 [exec] checkJavadocpath('%s/docs' % unpackPath)
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 904, in checkJavadocpath
 [exec] raise RuntimeError('missing javadocs package summaries!')
 [exec] RuntimeError: missing javadocs package summaries!

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/build.xml:321:
 exec returned: 1

Total time: 19 minutes 36 seconds
Build step 'Invoke Ant' marked build as failure
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge

2013-09-04 Thread Christine Poerschke (JIRA)
Christine Poerschke created SOLR-5214:
-

 Summary: collections?action=SPLITSHARD running out of heap space 
due to merge
 Key: SOLR-5214
 URL: https://issues.apache.org/jira/browse/SOLR-5214
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.3
Reporter: Christine Poerschke


The problem we saw was that splitting a shard with many segments and documents
failed by running out of heap space.

Increasing heap space so that all existing segments could be merged into one
overall segment does not seem practical. Running the split without segment
merging worked.

Could split always run without merging, or merge=true/false be an optional 
parameter for the SPLITSHARD action?



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge

2013-09-04 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-5214:
--

Attachment: SOLR-5214.patch

Attaching patch against trunk, to not merge when splitting (i.e. no 
merge=true/false parameter as yet).

> collections?action=SPLITSHARD running out of heap space due to merge
> 
>
> Key: SOLR-5214
> URL: https://issues.apache.org/jira/browse/SOLR-5214
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.3
>Reporter: Christine Poerschke
> Attachments: SOLR-5214.patch
>
>
> The problem we saw was that splitting a shard with many segments and documents
> failed by running out of heap space.
> Increasing heap space so that all existing segments could be merged into one
> overall segment does not seem practical. Running the split without segment
> merging worked.
> Could split always run without merging, or merge=true/false be an optional 
> parameter for the SPLITSHARD action?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757741#comment-13757741
 ] 

David Smiley commented on LUCENE-3069:
--

I like FSTOrd as well.  Presumably this one also exposes it via TermsEnum.ord()?

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge

2013-09-04 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-5214:
--

Attachment: SOLR-5214.patch

Correcting subReaders.length vs. leaves.size() typo in my original patch.

> collections?action=SPLITSHARD running out of heap space due to merge
> 
>
> Key: SOLR-5214
> URL: https://issues.apache.org/jira/browse/SOLR-5214
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.3
>Reporter: Christine Poerschke
> Attachments: SOLR-5214.patch, SOLR-5214.patch
>
>
> The problem we saw was that splitting a shard with many segments and documents
> failed by running out of heap space.
> Increasing heap space so that all existing segments could be merged into one
> overall segment does not seem practical. Running the split without segment
> merging worked.
> Could split always run without merging, or merge=true/false be an optional 
> parameter for the SPLITSHARD action?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge

2013-09-04 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-5214:
--

Attachment: (was: SOLR-5214.patch)

> collections?action=SPLITSHARD running out of heap space due to merge
> 
>
> Key: SOLR-5214
> URL: https://issues.apache.org/jira/browse/SOLR-5214
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.3
>Reporter: Christine Poerschke
> Attachments: SOLR-5214.patch
>
>
> The problem we saw was that splitting a shard with many segments and documents
> failed by running out of heap space.
> Increasing heap space so that all existing segments could be merged into one
> overall segment does not seem practical. Running the split without segment
> merging worked.
> Could split always run without merging, or merge=true/false be an optional 
> parameter for the SPLITSHARD action?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-04 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757771#comment-13757771
 ] 

Han Jiang commented on LUCENE-3069:
---

Yes, with slight changes, it can support seek by ord. (With FST.getByOutput). 

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5188) Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors

2013-09-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757789#comment-13757789
 ] 

ASF subversion and git services commented on LUCENE-5188:
-

Commit 1520025 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1520025 ]

LUCENE-5188: Make CompressingStoredFieldsFormat more friendly to 
StoredFieldVisitors.

> Make CompressingStoredFieldsFormat more friendly to StoredFieldVisitors
> ---
>
> Key: LUCENE-5188
> URL: https://issues.apache.org/jira/browse/LUCENE-5188
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5188.patch
>
>
> The way CompressingStoredFieldsFormat works is that it first decompresses 
> data and then consults the StoredFieldVisitor. This is a bit wasteful in case 
> documents are big and only the first field of a document is of interest so 
> maybe we could decompress and consult the StoredFieldVicitor in a more 
> streaming fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757802#comment-13757802
 ] 

Robert Muir commented on LUCENE-5189:
-

{quote}
But I also don't mind moving forward with SWS.isFieldUpdate and remove it in a 
follow on issue ... as long as it's done before 4.5.
{quote}

I don't think that will be an issue at all.

if we want to iterate and leave the codec APIs broken, I won't object: but 
simple rule.

Trunk only.

We can't do this kind of stuff on the stable branch at all: Things that get 
backported there need to be "ready to ship".

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757814#comment-13757814
 ] 

ASF subversion and git services commented on LUCENE-3069:
-

Commit 1520034 from [~billy] in branch 'dev/branches/lucene3069'
[ https://svn.apache.org/r1520034 ]

LUCENE-3069: move TermDict impls to package 'memory', nuke all 'Temp' symbols

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-04 Thread Han Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3069:
--

Attachment: LUCENE-3069.patch

Patch from last commit, and summary:

Previously our term dictionary were both block-based: 

* BlockTerms dict breaks terms list into several blocks, as a linear 
  structure with skip points. 

* BlockTreeTerms dict uses a trie-like structure to decide how terms are 
  assigned to different blocks, and uses an FST index to optimize seeking 
  performance.

However, those two kinds of term dictionary don't hold all the term 
data in memory. For the worst case there would be at least two seeks:
one from index in memory, another from file on disk. And we already have 
many complicated optimizations for this...

If by design a term dictionary can be memory resident, the data structure 
will be simpler (after all we don't need maintain extra file pointers for 
a second-time seek, and we don't have to decide heuristic for how terms 
are clustered). And this is why those two FST-based implementation are 
introduced.

Another big change in the code is: since our term dictionaries were both 
block-based, previous API was also limited. It was the postings writer who 
collected term metadata, and the term dictionary who told postings writer 
the range of terms it should flush to block. However, encoding of terms 
data should be decided by term dictionary part, since postings writer 
doesn't always know how terms are structured in term dictionary...
Previous API had some tricky codes for this, e.g. PulsingPostingsWriter had
to use terms' ordinal in block to decide how to write metadata, which is 
unnecessary.

To make the API between term dict and postings list more 'pluggable' and 
'general', I refactored the PostingsReader/WriterBase. For example, the 
postings writer should provide some information to term dictionary, like 
how many metadata values are strictly monotonic, so that term dictionary 
can optimize delta-encoding itself. And since the term dictionary now fully
decides how metadata are written, it gets the ability to utilize 
intblock-based metadata encoding.

Now the two implementations of term dictionary can easily be plugged with 
current postings formats, like:
* FST41 = 
FSTTermdict + Lucene41PostingsBaseFormat,
* FSTOrd41 = 
FSTOrdTermdict + Lucene41PostingsBaseFormat. 
* FSTOrdPulsing41 = 
FSTOrdTermsdict + PulsingPostingsWrapper + Lucene41PostingsFormat

About performance, as shown before, those two term dict improve on primary 
key lookup, but still have overhead on wildcard query (both two term dict 
have only prefix information, and term dictionary cannot work well with 
this...). I'll try to hack this later.

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!

2013-09-04 Thread builder
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/57577/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testFlushDocCount

Error Message:
Captured an uncaught exception in thread: Thread[id=238, name=Thread-169, 
state=RUNNABLE, group=TGRP-TestFlushByRamOrCountsPolicy]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=238, name=Thread-169, state=RUNNABLE, 
group=TGRP-TestFlushByRamOrCountsPolicy]
Caused by: java.lang.RuntimeException: 
java.lang.ArrayIndexOutOfBoundsException: 591472
at __randomizedtesting.SeedInfo.seed([F60802156EF89C32]:0)
at 
org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:329)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 591472
at 
org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333)
at org.apache.lucene.codecs.compressing.LZ4.compressHC(LZ4.java:401)
at 
org.apache.lucene.codecs.compressing.CompressionMode$LZ4HighCompressor.compress(CompressionMode.java:177)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:227)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:160)
at 
org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:128)
at 
org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
at 
org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:278)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:272)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1170)
at 
org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:314)




Build Log:
[...truncated 752 lines...]
   [junit4] Suite: org.apache.lucene.index.TestFlushByRamOrCountsPolicy
   [junit4]   2> Set 04, 2013 5:07:51 QN 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
   [junit4]   2> WARNING: Uncaught exception in thread: 
Thread[Thread-169,5,TGRP-TestFlushByRamOrCountsPolicy]
   [junit4]   2> java.lang.RuntimeException: 
java.lang.ArrayIndexOutOfBoundsException: 591472
   [junit4]   2>at 
__randomizedtesting.SeedInfo.seed([F60802156EF89C32]:0)
   [junit4]   2>at 
org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:329)
   [junit4]   2> Caused by: java.lang.ArrayIndexOutOfBoundsException: 591472
   [junit4]   2>at 
org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333)
   [junit4]   2>at 
org.apache.lucene.codecs.compressing.LZ4.compressHC(LZ4.java:401)
   [junit4]   2>at 
org.apache.lucene.codecs.compressing.CompressionMode$LZ4HighCompressor.compress(CompressionMode.java:177)
   [junit4]   2>at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:227)
   [junit4]   2>at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:160)
   [junit4]   2>at 
org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:128)
   [junit4]   2>at 
org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
   [junit4]   2>at 
org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:278)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:272)
   [junit4]   2>at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446)
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519)
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189)
   [junit4]   2>at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1170)
   [junit4]   2>at 
org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:314)
   [junit4]   2> 
   [junit4]   1> FAILED exc:
   [junit4]   1> java.lang.ArrayIndexOutOfBoundsException: 591472
   [junit4]   1>at 
org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333)
   [junit4]   1>   

[jira] [Commented] (SOLR-5175) Don't reorder children document

2013-09-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757838#comment-13757838
 ] 

ASF subversion and git services commented on SOLR-5175:
---

Commit 1520045 from [~yo...@apache.org] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1520045 ]

SOLR-5175: keep child order in block index

> Don't reorder children document
> ---
>
> Key: SOLR-5175
> URL: https://issues.apache.org/jira/browse/SOLR-5175
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5175.patch
>
>
> AddUpdateCommand reverses children documents that causes failure of 
> BJQParserTest.testGrandChildren() discussed in SOLR-5168  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757864#comment-13757864
 ] 

Shai Erera commented on LUCENE-5189:


Just so I understand, if we gen FieldInfos, does that solve the brokenness of 
the Codec APIs (in addition to the other things that it solves)? If not, in 
what way are they broken, and is this break a new thing that NDV updates 
cause/expose, or it's a break that exists in general? Can you list the breaks 
here (because I think that FIS.gen solves all the points you raised above).

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5175) Don't reorder children document

2013-09-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757834#comment-13757834
 ] 

ASF subversion and git services commented on SOLR-5175:
---

Commit 1520042 from [~yo...@apache.org] in branch 'dev/trunk'
[ https://svn.apache.org/r1520042 ]

SOLR-5175: keep child order in block index

> Don't reorder children document
> ---
>
> Key: SOLR-5175
> URL: https://issues.apache.org/jira/browse/SOLR-5175
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5175.patch
>
>
> AddUpdateCommand reverses children documents that causes failure of 
> BJQParserTest.testGrandChildren() discussed in SOLR-5168  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757875#comment-13757875
 ] 

Robert Muir commented on LUCENE-5189:
-

{quote}
This would let us proceed (progress not perfection) and then later, we address 
it. Ie, I think the added boolean is a fair compromise.
{quote}

Its not a fair compromise at all.

To me, as a search engine library, this is not progress. its going backwards.
Yes: I'm looking at it solely from an API perspective.
Yes: others look at things from only features/performance perspective and do 
not seem to care about APIs.

But as a library, the API is all that matters.

So I just want to make it clear: saying "progress not perfection" is not a good 
excuse for leaving broken APIs about the codebase and shoving in features as 
fast as possible: its not progress to me so I simply do not see it that way. 

Frankly I am tired of hearing this phrase being used in this way, and when I 
see it in the future, it will encourage me to take a closer inspection of APIs 
and do pickier reviews.


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5168) BJQParserTest reproducible failures

2013-09-04 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757884#comment-13757884
 ] 

Mikhail Khludnev commented on SOLR-5168:


Ignore is removed at SOLR-5175. 
see 
https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test/org/apache/solr/search/join/BJQParserTest.java?r1=1520042&r2=1520041&pathrev=1520042

feel free to close this one. 

> BJQParserTest reproducible failures
> ---
>
> Key: SOLR-5168
> URL: https://issues.apache.org/jira/browse/SOLR-5168
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Yonik Seeley
> Attachments: BJQTest.patch
>
>
> two recent Jenkins builds have uncovered some test seeds that cause failures 
> in multiple test methods in BJQParserTest.  These seeds reproduce reliably 
> (as of trunk r1514815) ...
> {noformat}
> ant test  -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B 
> -Dtests.multiplier=3 -Dtests.slow=true
> ant test  -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E 
> -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757885#comment-13757885
 ] 

Robert Muir commented on LUCENE-5189:
-

{quote}
Just so I understand, if we gen FieldInfos, does that solve the brokenness of 
the Codec APIs (in addition to the other things that it solves)? If not, in 
what way are they broken, and is this break a new thing that NDV updates 
cause/expose, or it's a break that exists in general? Can you list the breaks 
here (because I think that FIS.gen solves all the points you raised above).
{quote}

It does not solve problem #2 (SegmentInfos.attributes). This API should 
removed, deprecated, made internal-only, or something like that. Another option 
is to move this stuff into the commit, but that might be overkill: today this 
stuff is only used as a backwards-compatibility crutch (i think) to read 3.x 
indexes: so it can possibly be just removed in trunk right now.

Gen'ing FieldInfos brings about its own set of questions as far as when/how/if 
any new fieldinfo information is merged and when/how its visible to the codec 
API. its very scary but I don't see any alternative at the moment.


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!

2013-09-04 Thread Adrien Grand
Actually this is a side-effect of LUCENE-5188. There is a bug in
LZ4.compressHC (which I committed to test various trade-offs between
compression speed and ratio but is not used in any official codec) on
very compressible inputs which seems to be more easily triggered now
that the inputs can be sliced. I have a fix that I'm testing and
should be able to commit soon.

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5175) Don't reorder children document

2013-09-04 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-5175.


Resolution: Fixed

committed - I agree it's nicer to not reorder (esp for the single-level case), 
but I don't think we should guarantee the document order - it's an 
implementation detail.

> Don't reorder children document
> ---
>
> Key: SOLR-5175
> URL: https://issues.apache.org/jira/browse/SOLR-5175
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Reporter: Mikhail Khludnev
>  Labels: patch, test
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5175.patch
>
>
> AddUpdateCommand reverses children documents that causes failure of 
> BJQParserTest.testGrandChildren() discussed in SOLR-5168  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5168) BJQParserTest reproducible failures

2013-09-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757888#comment-13757888
 ] 

Yonik Seeley commented on SOLR-5168:


Right - I was just running the test in a loop locally first to ensure 
everything was actually fixed.

> BJQParserTest reproducible failures
> ---
>
> Key: SOLR-5168
> URL: https://issues.apache.org/jira/browse/SOLR-5168
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Yonik Seeley
> Attachments: BJQTest.patch
>
>
> two recent Jenkins builds have uncovered some test seeds that cause failures 
> in multiple test methods in BJQParserTest.  These seeds reproduce reliably 
> (as of trunk r1514815) ...
> {noformat}
> ant test  -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B 
> -Dtests.multiplier=3 -Dtests.slow=true
> ant test  -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E 
> -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!

2013-09-04 Thread Robert Muir
I tried to look into this failure, thinking it was related to
LUCENE-5188 (since it happened just after that was committed and
involves stored fields compression).

doesnt reproduce for me though: maybe because of how the test uses threads?

On Wed, Sep 4, 2013 at 11:08 AM,   wrote:
> Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/57577/
>
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.index.TestFlushByRamOrCountsPolicy.testFlushDocCount
>
> Error Message:
> Captured an uncaught exception in thread: Thread[id=238, name=Thread-169, 
> state=RUNNABLE, group=TGRP-TestFlushByRamOrCountsPolicy]
>
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=238, name=Thread-169, state=RUNNABLE, 
> group=TGRP-TestFlushByRamOrCountsPolicy]
> Caused by: java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: 591472
> at __randomizedtesting.SeedInfo.seed([F60802156EF89C32]:0)
> at 
> org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:329)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 591472
> at 
> org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333)
> at org.apache.lucene.codecs.compressing.LZ4.compressHC(LZ4.java:401)
> at 
> org.apache.lucene.codecs.compressing.CompressionMode$LZ4HighCompressor.compress(CompressionMode.java:177)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:227)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:160)
> at 
> org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:128)
> at 
> org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
> at 
> org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:278)
> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:272)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446)
> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1170)
> at 
> org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:314)
>
>
>
>
> Build Log:
> [...truncated 752 lines...]
>[junit4] Suite: org.apache.lucene.index.TestFlushByRamOrCountsPolicy
>[junit4]   2> Set 04, 2013 5:07:51 QN 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-169,5,TGRP-TestFlushByRamOrCountsPolicy]
>[junit4]   2> java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: 591472
>[junit4]   2>at 
> __randomizedtesting.SeedInfo.seed([F60802156EF89C32]:0)
>[junit4]   2>at 
> org.apache.lucene.index.TestFlushByRamOrCountsPolicy$IndexThread.run(TestFlushByRamOrCountsPolicy.java:329)
>[junit4]   2> Caused by: java.lang.ArrayIndexOutOfBoundsException: 591472
>[junit4]   2>at 
> org.apache.lucene.codecs.compressing.LZ4$HCHashTable.insertAndFindBestMatch(LZ4.java:333)
>[junit4]   2>at 
> org.apache.lucene.codecs.compressing.LZ4.compressHC(LZ4.java:401)
>[junit4]   2>at 
> org.apache.lucene.codecs.compressing.CompressionMode$LZ4HighCompressor.compress(CompressionMode.java:177)
>[junit4]   2>at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:227)
>[junit4]   2>at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:160)
>[junit4]   2>at 
> org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:128)
>[junit4]   2>at 
> org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
>[junit4]   2>at 
> org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:278)
>[junit4]   2>at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:272)
>[junit4]   2>at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:446)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1519)
>[junit4]   2>at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1189)
>[junit4]   2>at 
> 

[jira] [Assigned] (SOLR-5210) amend example's schema.xml and solrconfig.xml for blockjoin support

2013-09-04 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-5210:
--

Assignee: Yonik Seeley

> amend example's schema.xml and solrconfig.xml for blockjoin support
> ---
>
> Key: SOLR-5210
> URL: https://issues.apache.org/jira/browse/SOLR-5210
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Assignee: Yonik Seeley
> Fix For: 4.5, 5.0
>
>
> I suppose it make sense to apply 
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig.xml?r1=1513290&r2=1513289&pathrev=1513290
>  and 
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/schema.xml?r1=1513290&r2=1513289&pathrev=1513290
>  to example's config too provide out-of-the-box block join experience. 
> WDYT?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757889#comment-13757889
 ] 

Robert Muir commented on LUCENE-5197:
-

Can we rename FixedGapTermsIndexReader.getSizeInBytes to 
FixedGapTermsIndexReader.ramBytesUsed?

Otherwise, the patch consistently uses the same name (ramBytesUsed) throughout, 
its just this one that is inconsistent.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5168) BJQParserTest reproducible failures

2013-09-04 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-5168.


   Resolution: Fixed
Fix Version/s: 5.0
   4.5

> BJQParserTest reproducible failures
> ---
>
> Key: SOLR-5168
> URL: https://issues.apache.org/jira/browse/SOLR-5168
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Yonik Seeley
> Fix For: 4.5, 5.0
>
> Attachments: BJQTest.patch
>
>
> two recent Jenkins builds have uncovered some test seeds that cause failures 
> in multiple test methods in BJQParserTest.  These seeds reproduce reliably 
> (as of trunk r1514815) ...
> {noformat}
> ant test  -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B 
> -Dtests.multiplier=3 -Dtests.slow=true
> ant test  -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E 
> -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!

2013-09-04 Thread Robert Muir
Thanks Adrien:

Would the TestCompressingStoredFieldsFormat would eventually catch it?

This one seems to randomize its parameters, but perhaps it would be
good to explicitly add Test*StoredFieldsFormat's for the different
test codec modes we have:
CompressionMode.HIGH_COMPRESSION, CompressionMode.FAST_DECOMPRESSION,
CompressionMode.Fast ?


On Wed, Sep 4, 2013 at 12:06 PM, Adrien Grand  wrote:
> Actually this is a side-effect of LUCENE-5188. There is a bug in
> LZ4.compressHC (which I committed to test various trade-offs between
> compression speed and ratio but is not used in any official codec) on
> very compressible inputs which seems to be more easily triggered now
> that the inputs can be sliced. I have a fix that I'm testing and
> should be able to commit soon.
>
> --
> Adrien
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757921#comment-13757921
 ] 

Shai Erera commented on LUCENE-5189:


bq. It does not solve problem #2 (SegmentInfos.attributes)

Correct. So this API is broken today for LiveDocsFormat (since it's the only 
updateable thing), but field updates only broaden the broken-ness into other 
formats (now only DVF, but in the future others too). Correct?

I think that moving this API into the commit is not an overkill. I remember 
Mike and I once discussed if we can use that API to save per-segment facets 
"schema details". I don't remember how this ended, but maybe we shouldn't 
remove it? Alternatively, we could gen SIFormat too ... that may be an overkill 
though. Recording per-segment StringStringMap in SIS seems simple enough.

Regarding FIS.gen, I honestly thought to keep it simple by writing all FIS 
entirely in each gen and not complicate the code by writing parts of an FI in 
different gens and merging them by SR. This is what I plan to do in this issue.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4909) Solr and IndexReader Re-opening on Replication Slave

2013-09-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757924#comment-13757924
 ] 

Robert Muir commented on SOLR-4909:
---

Thanks Michael: at a glance the patch looks good to me.

I wonder if we can improve the test: I'm a bit concerned with random merge 
policies that it might sporatically fail. Maybe we can change the test to use 
LogDocMergePolicy in its configuration and explicitly assert the segment 
structure.

I'll take a closer look as soon as I have a chance: its not your fault, the 
code around here is just a bit scary.

> Solr and IndexReader Re-opening on Replication Slave
> 
>
> Key: SOLR-4909
> URL: https://issues.apache.org/jira/browse/SOLR-4909
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java), search
>Affects Versions: 4.3
>Reporter: Michael Garski
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4909_confirm_keys.patch, SOLR-4909-demo.patch, 
> SOLR-4909_fix.patch, SOLR-4909.patch, SOLR-4909_v2.patch, SOLR-4909_v3.patch
>
>
> I've been experimenting with caching filter data per segment in Solr using a 
> CachingWrapperFilter & FilteredQuery within a custom query parser (as 
> suggested by [~yo...@apache.org] in SOLR-3763) and encountered situations 
> where the value of getCoreCacheKey() on the AtomicReader for each segment can 
> change for a given segment on disk when the searcher is reopened. As 
> CachingWrapperFilter uses the value of the segment's getCoreCacheKey() as the 
> key in the cache, there are situations where the data cached on that segment 
> is not reused when the segment on disk is still part of the index. This 
> affects the Lucene field cache and field value caches as well as they are 
> cached per segment.
> When Solr first starts it opens the searcher's underlying DirectoryReader in 
> StandardIndexReaderFactory.newReader by calling 
> DirectoryReader.open(indexDir, termInfosIndexDivisor), and the reader is 
> subsequently reopened in SolrCore.openNewSearcher by calling 
> DirectoryReader.openIfChanged(currentReader, writer.get(), true). The act of 
> reopening the reader with the writer when it was first opened without a 
> writer results in the value of getCoreCacheKey() changing on each of the 
> segments even though some of the segments have not changed. Depending on the 
> role of the Solr server, this has different effects:
> * On a SolrCloud node or free-standing index and search server the segment 
> cache is invalidated during the first DirectoryReader reopen - subsequent 
> reopens use the same IndexWriter instance and as such the value of 
> getCoreCacheKey() on each segment does not change so the cache is retained. 
> * For a master-slave replication set up the segment cache invalidation occurs 
> on the slave during every replication as the index is reopened using a new 
> IndexWriter instance which results in the value of getCoreCacheKey() changing 
> on each segment when the DirectoryReader is reopened using a different 
> IndexWriter instance.
> I can think of a few approaches to alter the re-opening behavior to allow 
> reuse of segment level caches in both cases, and I'd like to get some input 
> on other ideas before digging in:
> * To change the cloud node/standalone first commit issue it might be possible 
> to create the UpdateHandler and IndexWriter before the DirectoryReader, and 
> use the writer to open the reader. There is a comment in the SolrCore 
> constructor by [~yo...@apache.org] that the searcher should be opened before 
> the update handler so that may not be an acceptable approach. 
> * To change the behavior of a slave in a replication set up, one solution 
> would be to not open a writer from the SnapPuller when the new index is 
> retrieved if the core is enabled as a slave only. The writer is needed on a 
> server configured as a master & slave that is functioning as a replication 
> repeater so downstream slaves can see the changes in the index and retrieve 
> them.
> I'll attach a unit test that demonstrates the behavior of reopening the 
> DirectoryReader and it's effects on the value of getCoreCacheKey. My 
> assumption is that the behavior of Lucene during the various reader reopen 
> operations is correct and that the changes are necessary on the Solr side of 
> things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57577 - Failure!

2013-09-04 Thread Adrien Grand
Good ideas. Uwe also suggested to open an issue so that this bug fix
is in the changelog. I will do it soon...

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5201) Compression issue on highly compressible inputs with LZ4.compressHC

2013-09-04 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-5201:


 Summary: Compression issue on highly compressible inputs with 
LZ4.compressHC
 Key: LUCENE-5201
 URL: https://issues.apache.org/jira/browse/LUCENE-5201
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 5.0, 4.5


LZ4.compressHC sometimes fails at compressing highly compressible inputs when 
the start offset is > 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5201) Compression issue on highly compressible inputs with LZ4.compressHC

2013-09-04 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757930#comment-13757930
 ] 

Adrien Grand commented on LUCENE-5201:
--

A fix is already committed but I opened this issue on the suggestion of Uwe so 
that it has an entry in the changelog.

> Compression issue on highly compressible inputs with LZ4.compressHC
> ---
>
> Key: LUCENE-5201
> URL: https://issues.apache.org/jira/browse/LUCENE-5201
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 5.0, 4.5
>
>
> LZ4.compressHC sometimes fails at compressing highly compressible inputs when 
> the start offset is > 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757931#comment-13757931
 ] 

Hoss Man commented on SOLR-2548:


bq. Still checking on the implications of stacking up a bunch of 
directExecutors all through the CompletionService, not something I've used 
recently and the details are hazy.

unless i'm missing something, it should be a single directExecutor, and when a 
job is submitted to the CompletionService, nothing happens in the background at 
all -- the thread that submitted the job then immediately executes the job.  
Telling the COpmletionService to use the directExecutor is essentially a way of 
saying "when someone asks you to do execute X, make them do it themselves"

bq. Is there a decent way to check whether more than one thread was actually 
spawned?

I doubt it ... but it would be nice to at least know the functionality succeeds 
w/o failure.

There might be a way to subclass & instrument the ThreadPoolExecutor (or the 
Queue it uses to manage jobs) so that you could make it keep track of the max 
number of live threads at any one time, or the max size of the queue at any one 
time, and then your test could reach in and inspect either of those values to 
know if the _wrong_ thing happened (ie: too many threads spun up, or too many 
things enqueued w/o being handed to threads) ... but i'm not sure how hard that 
would be.

Acctually -- maybe a better thing to do would be to have the Callables record 
the thread id of whatever thread executed them, and include that in the debug 
info ... then the test could just confirm that all of the ids match and don't 
start with "facetExecutor-" in the directExecutor case, and that the number of 
unique ids seen is not greater then N in the facet.threads=N case.  (That debug 
info could theoretically be useful to end users as well, to see that multiple 
threads really are getting used)

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5210) amend example's schema.xml and solrconfig.xml for blockjoin support

2013-09-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758033#comment-13758033
 ] 

ASF subversion and git services commented on SOLR-5210:
---

Commit 1520081 from [~yo...@apache.org] in branch 'dev/trunk'
[ https://svn.apache.org/r1520081 ]

SOLR-5210: add block join support to example

> amend example's schema.xml and solrconfig.xml for blockjoin support
> ---
>
> Key: SOLR-5210
> URL: https://issues.apache.org/jira/browse/SOLR-5210
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Assignee: Yonik Seeley
> Fix For: 4.5, 5.0
>
>
> I suppose it make sense to apply 
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig.xml?r1=1513290&r2=1513289&pathrev=1513290
>  and 
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/schema.xml?r1=1513290&r2=1513289&pathrev=1513290
>  to example's config too provide out-of-the-box block join experience. 
> WDYT?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5210) amend example's schema.xml and solrconfig.xml for blockjoin support

2013-09-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758036#comment-13758036
 ] 

ASF subversion and git services commented on SOLR-5210:
---

Commit 1520082 from [~yo...@apache.org] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1520082 ]

SOLR-5210: add block join support to example

> amend example's schema.xml and solrconfig.xml for blockjoin support
> ---
>
> Key: SOLR-5210
> URL: https://issues.apache.org/jira/browse/SOLR-5210
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Assignee: Yonik Seeley
> Fix For: 4.5, 5.0
>
>
> I suppose it make sense to apply 
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig.xml?r1=1513290&r2=1513289&pathrev=1513290
>  and 
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/schema.xml?r1=1513290&r2=1513289&pathrev=1513290
>  to example's config too provide out-of-the-box block join experience. 
> WDYT?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-04 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5197:
-

Attachment: LUCENE-5197.patch

Changed getSizeInbytes to ramBytesUsed as Robert suggested

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5210) amend example's schema.xml and solrconfig.xml for blockjoin support

2013-09-04 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-5210.


Resolution: Fixed

> amend example's schema.xml and solrconfig.xml for blockjoin support
> ---
>
> Key: SOLR-5210
> URL: https://issues.apache.org/jira/browse/SOLR-5210
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Assignee: Yonik Seeley
> Fix For: 4.5, 5.0
>
>
> I suppose it make sense to apply 
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig.xml?r1=1513290&r2=1513289&pathrev=1513290
>  and 
> https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/schema.xml?r1=1513290&r2=1513289&pathrev=1513290
>  to example's config too provide out-of-the-box block join experience. 
> WDYT?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5142) Block Indexing / Join Improvements

2013-09-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758043#comment-13758043
 ] 

Yonik Seeley commented on SOLR-5142:


bq. Don't you feel unique key should be optional for children documents?

unique keys are for more than just implementing overwriting though - they are 
needed for things like distributed search.

> Block Indexing / Join Improvements
> --
>
> Key: SOLR-5142
> URL: https://issues.apache.org/jira/browse/SOLR-5142
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 4.5, 5.0
>
>
> Follow-on main issue for general block indexing / join improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5200) HighFreqTerms has confusing behavior with -t option

2013-09-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5200:


Attachment: LUCENE-5200.patch

> HighFreqTerms has confusing behavior with -t option
> ---
>
> Key: LUCENE-5200
> URL: https://issues.apache.org/jira/browse/LUCENE-5200
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Reporter: Robert Muir
> Attachments: LUCENE-5200.patch
>
>
> {code}
>  * HighFreqTerms class extracts the top n most frequent terms
>  * (by document frequency) from an existing Lucene index and reports their
>  * document frequency.
>  * 
>  * If the -t flag is given, both document frequency and total tf (total
>  * number of occurrences) are reported, ordered by descending total tf.
> {code}
> Problem #1:
> Its tricky what happens with -t: if you ask for the top-100 terms, it 
> requests the top-100 terms (by docFreq), then resorts the top-N by 
> totalTermFreq.
> So its not really the top 100 most frequently occurring terms.
> Problem #2: 
> Using the -t option can be confusing and slow: the reported docFreq includes 
> deletions, but totalTermFreq does not (it actually walks postings lists if 
> there is even one deletion).
> I think this is a relic from 3.x days when lucene did not support this 
> statistic. I think we should just always output both TermsEnum.docFreq() and 
> TermsEnum.totalTermFreq(), and -t just determines the comparator of the PQ.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5201) Compression issue on highly compressible inputs with LZ4.compressHC

2013-09-04 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5201:
-

Attachment: LUCENE-5201.patch

This bugs needed two conditions to appear:
 - the input needs to be highly compressible so that there are collisions in 
the chain table used for finding references backwards in the stream,
 - the start offset needs to be > 0.

CompressingStoredFieldFormat only calls LZ4.compress(HC) with positive start 
offsets since LUCENE-5188 so this shouldn't have impact on people who were 
using CompressionMode.FAST_DECOMPRESSION (which seems to be confirmed by the 
fact that we never saw any test failure related to this until today, only a few 
minutes after I committed LUCENE-5188).

I was able to write a test case that reproduces the bug and changed the 
existing tests so that they don't only test compression with a start offset of 
0.

> Compression issue on highly compressible inputs with LZ4.compressHC
> ---
>
> Key: LUCENE-5201
> URL: https://issues.apache.org/jira/browse/LUCENE-5201
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5201.patch
>
>
> LZ4.compressHC sometimes fails at compressing highly compressible inputs when 
> the start offset is > 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-04 Thread Areek Zillur (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758082#comment-13758082
 ] 

Areek Zillur commented on LUCENE-5197:
--

[~mikemccand]
I can add a ramBytesUsed method to the SimpleTextTerms class to account for it. 
But only under the assumption that SimpleTextTerms implementation will be used 
for the SimpleTextFieldsReader (it uses the abstract Terms class in the 
termsCache). comments?

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2548:
-

Attachment: SOLR-2548.patch

Hmm, the whole recording-thread-info is a little more ambitious than I want to 
be right now. For the nonce, I did some "by hand" debugging, added in a couple 
of (temporary) print message in the getUnInvertedField code and insured that 
when it's called it only executes once per field, so I think I'll call that 
good now.

I did play around with the directExcecutor and now I get to add another bit of 
knowledge, that it's really kind of cool that it allows one to have code like 
this. No matter how many times you submit a job, it all just executes in the 
current thread. Arcane, but kind of cool.

As for the rest, I've added at least functional tests and one test that the 
caching code is working that's non-deterministic but might trip bad conditions 
at least some of the time.

So unless people object I'll be committing this probably tomorrow. It passes 
precommit and at least all the tests in TestFaceting, I'll be running the full 
suite in a minute.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge

2013-09-04 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-5214:
---

Assignee: Shalin Shekhar Mangar

> collections?action=SPLITSHARD running out of heap space due to merge
> 
>
> Key: SOLR-5214
> URL: https://issues.apache.org/jira/browse/SOLR-5214
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.3
>Reporter: Christine Poerschke
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-5214.patch
>
>
> The problem we saw was that splitting a shard with many segments and documents
> failed by running out of heap space.
> Increasing heap space so that all existing segments could be merged into one
> overall segment does not seem practical. Running the split without segment
> merging worked.
> Could split always run without merging, or merge=true/false be an optional 
> parameter for the SPLITSHARD action?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5213) collections?action=SPLITSHARD parent vs. sub-shards numDocs

2013-09-04 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-5213:
---

Assignee: Shalin Shekhar Mangar

> collections?action=SPLITSHARD parent vs. sub-shards numDocs
> ---
>
> Key: SOLR-5213
> URL: https://issues.apache.org/jira/browse/SOLR-5213
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.4
>Reporter: Christine Poerschke
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-5213.patch
>
>
> The problem we saw was that splitting a shard took a long time and at the end 
> of it the sub-shards contained fewer documents than the original shard.
> The root cause was eventually tracked down to the disappearing documents not 
> falling into the hash ranges of the sub-shards.
> Could SolrIndexSplitter split report per-segment numDocs for parent and 
> sub-shards, with at least a warning logged for any discrepancies (documents 
> falling into none of the sub-shards or documents falling into several 
> sub-shards)?
> Additionally, could a case be made for erroring out when discrepancies are 
> detected i.e. not proceeding with the shard split? Either to always error or 
> to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD 
> action.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge

2013-09-04 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758153#comment-13758153
 ] 

Shalin Shekhar Mangar commented on SOLR-5214:
-

Thanks Christine. Do you have the OutOfMemoryError stack trace?

> collections?action=SPLITSHARD running out of heap space due to merge
> 
>
> Key: SOLR-5214
> URL: https://issues.apache.org/jira/browse/SOLR-5214
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.3
>Reporter: Christine Poerschke
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-5214.patch
>
>
> The problem we saw was that splitting a shard with many segments and documents
> failed by running out of heap space.
> Increasing heap space so that all existing segments could be merged into one
> overall segment does not seem practical. Running the split without segment
> merging worked.
> Could split always run without merging, or merge=true/false be an optional 
> parameter for the SPLITSHARD action?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758164#comment-13758164
 ] 

Yonik Seeley commented on SOLR-2548:


bq. I used a simple wait/sleep loop here

Ugh - please let's not do that for multi-threaded code.

Also, I see some stuff like this in the patch:
{code}
-  counts = getGroupedCounts(searcher, docs, field, multiToken, 
offset,limit, mincount, missing, sort, prefix);
+  counts = getGroupedCounts(searcher, base, field, multiToken, 
offset,limit, mincount, missing, sort, prefix);
{code}
Was there a bug that these changes fixed?

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

2013-09-04 Thread Ricardo Merizalde (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricardo Merizalde updated SOLR-5215:


Description: 
We are constantly seeing a deadlocks in our production application servers.

The problem seems to be that a thread A:

- tries to process an event and acquires the ConnectionManager lock
- the update callback acquires connectionUpdateLock and invokes waitForConnected
- waitForConnected tries to acquire the ConnectionManager lock (which already 
has)
- waitForConnected calls wait and release the ConnectionManager lock (but still 
has the connectionUpdateLock)

The a thread B:

- tries to process an event and acquires the ConnectionManager lock
- the update call back tries to acquire connectionUpdateLock but gets blocked 
holding the ConnectionManager lock and preventing thread A from getting out of 
the wait state.
 
Here is part of the thread dump:

"http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 
nid=0x3e81 waiting for monitor entry [0x57169000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
- waiting to lock <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

"http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 
nid=0x3e67 waiting for monitor entry [0x4dbd4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
- waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
- locked <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

"http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 
nid=0x3d9a waiting for monitor entry [0x42821000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
- locked <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
- locked <0x2aab1b0e0f78> (a java.lang.Object)
at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
- locked <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

Found one Java-level deadlock:
=
"http-0.0.0.0-8080-82-EventThread":
  waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
org.apache.solr.common.cloud.ConnectionManager),
  which is held by "http-0.0.0.0-8080-82-EventThread"
"http-0.0.0.0-8080-82-EventThread":
  waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a 
java.lang.Object),
  which is held by "http-0.0.0.0-8080-82-EventThread"
"http-0.0.0.0-8080-82-EventThread":
  waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
org.apache.solr.common.cloud.ConnectionManager),
  which is held by "http-0.0.0.0-8080-82-EventThread"
  
  
Java stack information for the threads listed above:
===
"http-0.0.0.0-8080-82-EventThread":
at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
- waiting to lock <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
"http-0.0.0.0-8080-82-EventThread":
at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
- waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStra

[jira] [Created] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

2013-09-04 Thread Ricardo Merizalde (JIRA)
Ricardo Merizalde created SOLR-5215:
---

 Summary: Deadlock in Solr Cloud ConnectionManager
 Key: SOLR-5215
 URL: https://issues.apache.org/jira/browse/SOLR-5215
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.2.1
 Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 
x86_64 x86_64 x86_64 GNU/Linux

java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
Reporter: Ricardo Merizalde


We are constantly seeing a deadlock in our production application servers.

The problem seems to be that a thread A:

- try to process an event and acquires the ConnectionManager lock
- the update callback acquires connectionUpdateLock and invokes waitForConnected
- waitForConnected tries to acquire the ConnectionManager lock (which already 
has)
- waitForConnected calls wait releasing the ConnectionManager lock (but still 
has the connectionUpdateLock)

The thread B:

- tries to process an event and acquires the ConnectionManager lock
- the update call back tries to acquire connectionUpdateLock but gets blocked 
holding the ConnectionManager lock and preventing thread A from getting out of 
the wait state.
 

Here is part of the thread dump:

"http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 
nid=0x3e81 waiting for monitor entry [0x57169000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
- waiting to lock <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

"http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 
nid=0x3e67 waiting for monitor entry [0x4dbd4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
- waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
- locked <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

"http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 
nid=0x3d9a waiting for monitor entry [0x42821000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
- locked <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
- locked <0x2aab1b0e0f78> (a java.lang.Object)
at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
- locked <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

Found one Java-level deadlock:
=
"http-0.0.0.0-8080-82-EventThread":
  waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
org.apache.solr.common.cloud.ConnectionManager),
  which is held by "http-0.0.0.0-8080-82-EventThread"
"http-0.0.0.0-8080-82-EventThread":
  waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a 
java.lang.Object),
  which is held by "http-0.0.0.0-8080-82-EventThread"
"http-0.0.0.0-8080-82-EventThread":
  waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
org.apache.solr.common.cloud.ConnectionManager),
  which is held by "http-0.0.0.0-8080-82-EventThread"
  
  
Java stack information for the threads listed above:
===
"http-0.0.0.0-8080-82-EventThread":
at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
- waiting to lock <0x2aab1b0e0ce0> (a 
org.apache.solr.common.cloud.ConnectionManager)
at 
org.apache.zookeeper.Clien

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758233#comment-13758233
 ] 

Michael McCandless commented on LUCENE-5189:


bq. Frankly I am tired of hearing this phrase being used in this way

Actually, I think this is a fair use of "progress not perfection".
Either that or I don't understand what you're calling "broken APIs" in
the current patch.

As I understand it, what's "broken" here is that you cannot set the
attributes in SegmentInfo nor FieldInfo from your DocValuesFormat
writer when it's an update being written: the changes won't be saved.

So, I proposed that we document this as a limitation of the SI/FI
attributes API: when writing updates, any changes will be lost.  For
"normal" segment flushes, they work correctly. It'd be a documented
limitation, and we can later fix it.

I think this situation is very similar to LUCENE-5197, which I would
also call "progress not perfection": we are adding a new API
(SegmentReader.ramBytesUsed), with an initial implementation that we
think might be improved by later cutting over to RamUsageEstimator.
But I think we should commit the initial approach (it's useful, it
should work well) and later improve the implementation.


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758234#comment-13758234
 ] 

Michael McCandless commented on LUCENE-5197:


bq.  But only under the assumption that SimpleTextTerms implementation will be 
used for the SimpleTextFieldsReader (it uses the abstract Terms class in the 
termsCache). comments?

I think it's fine to change its termsCache to be SimpleTextTerms.  Thanks!

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758251#comment-13758251
 ] 

Michael McCandless commented on LUCENE-5189:


One option, to solve the "some segments might be missing the field entirely so 
you cannot update those" would be to have the FieldInfos accumulate across 
segments, i.e. a more global FieldInfos, maybe written to a separate global 
file (not per segment).

This way, if any doc in any segment has added the field, then the global 
FieldInfos would contain it.

Not saying this is an appealing option (there are tons of tradeoffs), but I 
think it would address that limitation.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758257#comment-13758257
 ] 

Michael McCandless commented on LUCENE-5189:


Actually, that would also solve the other problems as well?

Ie, the global FieldInfos would be gen'd: on commit we'd write a new FIS file, 
which all segments in that commit point would use.

Any attribute changes to a FieldInfo would be saved, even on update; new fields 
could be created via update; any segments that have no documents with the field 
won't be an issue.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5197) Add a method to SegmentReader to get the current index heap memory size

2013-09-04 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5197:
-

Attachment: LUCENE-5197.patch

Took into account termsCache in SimpleTextFieldReader as discussed with Michael.

> Add a method to SegmentReader to get the current index heap memory size
> ---
>
> Key: LUCENE-5197
> URL: https://issues.apache.org/jira/browse/LUCENE-5197
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Areek Zillur
> Attachments: LUCENE-5197.patch, LUCENE-5197.patch, LUCENE-5197.patch, 
> LUCENE-5197.patch
>
>
> It would be useful to at least estimate the index heap size being used by 
> Lucene. Ideally a method exposing this information at the SegmentReader level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758261#comment-13758261
 ] 

Erick Erickson commented on SOLR-2548:
--

bq: Was there a bug that these changes fixed?

Nope, I thought it was a refactoring and didn't look closely. It appears to be 
useless complexity, perhaps a remnant from the original patch against 3.1. I 
took them out.

bq: please let's not do that for multi-threaded code.

I can always count on you to call me on sleeping, don't know why I even try to 
put a sleep in any more :). OK, took it out and substituted a notifyAll. And 
added a test that gets into this code while actually doing the inverting rather 
than just pulls stuff from the cache.

I'll attach a new patch in a few.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5142) Block Indexing / Join Improvements

2013-09-04 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758286#comment-13758286
 ] 

Mikhail Khludnev commented on SOLR-5142:


bq. they are needed for things like distributed search.

I don't think children participate in distributed search. Everything is handled 
on parents level.
I suppose uniqueKey field should span whole block, instead of \_root_. 

> Block Indexing / Join Improvements
> --
>
> Key: SOLR-5142
> URL: https://issues.apache.org/jira/browse/SOLR-5142
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 4.5, 5.0
>
>
> Follow-on main issue for general block indexing / join improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758271#comment-13758271
 ] 

Yonik Seeley commented on SOLR-2548:


bq. It appears to be useless complexity, perhaps a remnant from the original 
patch against 3.1. I took them out.

Actually, I see now (and it's absolutely needed ;-)
The base docset can change from one facet request to another (think excludes), 
hence if we go multi-threaded, we can't reference "SimpleFacets.docs" in 
anything that could be executed from a separate thread. 

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5214) collections?action=SPLITSHARD running out of heap space due to merge

2013-09-04 Thread Christine Poerschke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758323#comment-13758323
 ] 

Christine Poerschke commented on SOLR-5214:
---

Hello. Here's one of the stack traces. And in case it's useful context, during 
the shard split indexing into the cloud had been stopped but periodic 
admin/luke and admin/mbeans cat=CACHE stats requests were happening.

{noformat}
2013-09-03 07:27:51,947 ERROR [qtp1533478516-49] o.a.s.s.SolrDispatchFilter 
[SolrException.java:119] null:java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding.decode(StringCoding.java:215)
at java.lang.String.(String.java:453)
at java.lang.String.(String.java:505)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:154)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:272)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:133)
at 
org.apache.lucene.index.FilterAtomicReader.document(FilterAtomicReader.java:365)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:332)
at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:298)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:86)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2448)
at 
org.apache.solr.update.SolrIndexSplitter.split(SolrIndexSplitter.java:118)
at 
org.apache.solr.update.DirectUpdateHandler2.split(DirectUpdateHandler2.java:749)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:282)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:185)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:206)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
{noformat}


> collections?action=SPLITSHARD running out of heap space due to merge
> 
>
> Key: SOLR-5214
> URL: https://issues.apache.org/jira/browse/SOLR-5214
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.3
>Reporter: Christine Poerschke
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-5214.patch
>
>
> The problem we saw was that splitting a shard with many segments and documents
> failed by running out of heap space.
> Increasing heap space so that all existing segments could be merged into one
> overall segment does not seem practical. Running the split without segment
> merging worked.
> Could split always run without merging, or merge=true/false be an optional 
> parameter for the SPLITSHARD action?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758314#comment-13758314
 ] 

Yonik Seeley commented on SOLR-2548:


One issue with a static "pending" set on UnInvertedField is that it will block 
different cores trying to un-invert the same field.
This should probably be implemented the same way the FieldCache does it 
(insertion of a placeholder).


> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2548:
-

Attachment: SOLR-2548.patch

OK, maybe this time.

1> put back the passing in base.
2> took out the sleep.
3> changed how exceptions are propagated up past the new threads which fixed 
another test that this code broke.
4> Added a non-deterministic test that forces parallel uninverting of the 
fields to make sure we exercise the synchronize/notify code. This test can't 
_guarantee_ to execute that code every time, but it did manage with some 
printlns.

Running tests again, precommit all that. Won't check in until at least tomorrow.

And thank heaven for "local history" in IntelliJ ;)

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758325#comment-13758325
 ] 

Yonik Seeley commented on LUCENE-5189:
--

The problem that Mike highlights "some segments might be missing the field 
entirely so you cannot update those", is pretty bad though.  Things work 
differently (i.e. your update may fail) depending on exactly how segment 
flushes and merges are done.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758311#comment-13758311
 ] 

Shai Erera commented on LUCENE-5189:


I think global FIS is an interesting idea, but per-segment FIS.gen is a lower 
hanging fruit. I did it once and it was quite straightforward (maybe someone 
will have reservations on how I did it though):

* SIS tracks fieldInfosGen (in this patch, rename all dvGen in SIS to fisGen)
* FI tracks dvGen
* A new FIS45Format reads/writes each FI's dvGen
* ReaderAndLiveDocs writes a new FIS gen, containing the entire FIS, so SR only 
reads the latest gen to load FIS

I think we should explore global FIS separately, because it brings its own 
issues, e.g. do we keep FISFormat or nuke it? Who invokes it (probably SIS)? 
It's also quite orthogonal to that issue, or at least, we can proceed with it 
and improve FIS gen'ing later with global FIS.

As for SI.attributes(), I think we can move them under SIS. We should open an 
issue to do that.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2013-09-04 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758362#comment-13758362
 ] 

Erick Erickson commented on SOLR-2548:
--

Still have a test error in TestDistributedGrouping, no clue why and can't look 
right now. It's certainly a result of the changes in UnInvertField since if I 
put that in a clean trunk the same problem occurs.

My guess is that I can't synchronize on cache for some reason, but not much in 
the way of evidence for that right now.

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Assignee: Erick Erickson
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, 
> SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-04 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758328#comment-13758328
 ] 

Shai Erera commented on LUCENE-5189:


Correct, that's a problem that Rob identified few days ago and it can be solved 
if we gen FieldInfos, because ReaderAndLiveDocs will detect that case and add a 
new FieldInfo, as well as create a new gen for this segment's FIS.I have two 
tests in TestNumericDVUpdates which currently test that this is not supported 
-- once we gen FIS, we'll change them to assert it is supported.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4277) Spellchecker sometimes falsely reports a spelling error and correction

2013-09-04 Thread scott hobson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758364#comment-13758364
 ] 

scott hobson commented on SOLR-4277:


Hi,
I am having this same issue. The "correctlySpelled" flag is always false. I 
understand that it should still be giving suggestions for the "did you mean..." 
searches, but shouldn't the correctlySpelled flag at least be accurate? It 
could easily say true and still give you suggested words, and that would be 
even better because you can differentiate between a suggestion and a 
correction. Right now you cannot, unless I'm missing something... 

Thanks,
Scott

> Spellchecker sometimes falsely reports a spelling error and correction
> --
>
> Key: SOLR-4277
> URL: https://issues.apache.org/jira/browse/SOLR-4277
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 4.0
>Reporter: Jack Krupansky
>
> In some cases, the Solr spell checker improperly reports query terms as being 
> misspelled.
> Using the Solr example for 4.0, I added these mini documents:
> {code}
> curl http://localhost:8983/solr/update?commit=true -H 
> 'Content-type:application/csv' -d '
> id,name
> spel-1,aardvark abacus ball bill cat cello
> spel-2,abate accord band bell cattle check
> spel-3,adorn border clean clock'
> {code}
> I then issued this request:
> {code}
> curl "http://localhost:8983/solr/spell/?q=check&indent=true";
> {code}
> The spell checker falsely concluded that "check" was misspelled and 
> improperly corrected it to "clock":
> {code}
> 
>   
> 
>   1
>   0
>   5
>   1
>   
> 
>   clock
>   1
> 
>   
> 
> false
> 
>   clock
>   1
>   
> clock
>   
> 
>   
> 
> {code}
> And if I query for "clock", it gets corrected to "check"!
> {code}
> curl "http://localhost:8983/solr/spell/?q=clock&indent=true";
> {code}
> {code}
>   
> 
>   1
>   0
>   5
>   1
>   
> 
>   check
>   1
> 
>   
> 
> false
> 
>   check
>   1
>   
> check
>   
> 
>   
> {code}
> Note: This appears to be only because "clock" is so close to "check". With 
> other terms I don't see the problem:
> {code}
> curl "http://localhost:8983/solr/spell/?q=cattle+abate+check&indent=true";
> {code}
> {code}
>   
> 
>   1
>   13
>   18
>   1
>   
> 
>   clock
>   1
> 
>   
> 
> false
> 
>   cattle abate clock
>   2
>   
> cattle
> abate
> clock
>   
> 
>   
> {code}
> Although, it inappropriately lists "cattle" and "abate" in the "misspellings" 
> section even though no suggestions were offered.
> Finally, I can workaround this issue by removing the following line from 
> solrconfig.xml:
> {code}
>   5
> {code}
> Which responds to the previous request with:
> {code}
>   
> false
>   
> {code}
> Which makes the original problem go away. Although, it does beg the question 
> as to why my 100% correct query is still tagged as "correctlySpelled" = 
> "false", but that's a separate Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



I believe in a project using Lucene

2013-09-04 Thread Alberto Marques
Hello
My question is simple, I believe in a project using Lucene. To be able to index 
a website. As 
http://boc.cantabria.es/boces/boletines.do?boton=UltimoBOCPublicado, seeking 
information on pdf files. Is it possible?  

Re: I believe in a project using Lucene

2013-09-04 Thread Gora Mohanty
On 5 September 2013 06:53, Alberto Marques
 wrote:
> Hello
> My question is simple, I believe in a project using Lucene. To be able to
> index a website. As
> http://boc.cantabria.es/boces/boletines.do?boton=UltimoBOCPublicado, seeking
> information on pdf files. Is it possible?

Yes, it is eminently possible. I would suggest using Solr
instead of Lucene directly. You should be able to get
started by searching Google on the topic, or looking at
the Solr Wiki, e.g., http://wiki.apache.org/solr/ExtractingRequestHandler

If you need further help, such a question is better addressed
to the solr-user mailing list rather than this one, which is
meant for discussions related to development.

Regards,
Gora

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

2013-09-04 Thread Ricardo Merizalde (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricardo Merizalde updated SOLR-5215:


Component/s: SolrCloud

> Deadlock in Solr Cloud ConnectionManager
> 
>
> Key: SOLR-5215
> URL: https://issues.apache.org/jira/browse/SOLR-5215
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, SolrCloud
>Affects Versions: 4.2.1
> Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>Reporter: Ricardo Merizalde
>
> We are constantly seeing a deadlocks in our production application servers.
> The problem seems to be that a thread A:
> - tries to process an event and acquires the ConnectionManager lock
> - the update callback acquires connectionUpdateLock and invokes 
> waitForConnected
> - waitForConnected tries to acquire the ConnectionManager lock (which already 
> has)
> - waitForConnected calls wait and release the ConnectionManager lock (but 
> still has the connectionUpdateLock)
> The a thread B:
> - tries to process an event and acquires the ConnectionManager lock
> - the update call back tries to acquire connectionUpdateLock but gets blocked 
> holding the ConnectionManager lock and preventing thread A from getting out 
> of the wait state.
>  
> Here is part of the thread dump:
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x59965800 
> nid=0x3e81 waiting for monitor entry [0x57169000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
> - waiting to lock <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x5ad4 
> nid=0x3e67 waiting for monitor entry [0x4dbd4000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - waiting to lock <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x2aac4c2f7000 
> nid=0x3d9a waiting for monitor entry [0x42821000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
> - locked <0x2aab1b0e0f78> (a java.lang.Object)
> at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
> at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
> - locked <0x2aab1b0e0ce0> (a 
> org.apache.solr.common.cloud.ConnectionManager)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 
> Found one Java-level deadlock:
> =
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x2aac4c314978 (object 0x2aab1b0e0f78, a 
> java.lang.Object),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x5c7694b0 (object 0x2aab1b0e0ce0, a 
> org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"