Build failed in Jenkins: the 4547 machine gun #154
See http://fortyounce.servebeer.com/job/the%204547%20machine%20gun/154/ -- [...truncated 1011 lines...] [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestCheckIndex [junit4:junit4] Completed on J3 in 0.03s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestRegexpQuery [junit4:junit4] Completed on J1 in 0.05s, 7 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spans.TestNearSpansOrdered [junit4:junit4] Completed on J0 in 0.22s, 10 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestDocCount [junit4:junit4] Completed on J3 in 0.05s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestSumDocFreq [junit4:junit4] Completed on J1 in 0.23s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.TestSearchForDuplicates [junit4:junit4] Completed on J0 in 0.15s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestPerSegmentDeletes [junit4:junit4] Completed on J3 in 0.09s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterConfig [junit4:junit4] Completed on J1 in 0.04s, 9 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestBeforeAfterOverrides [junit4:junit4] Completed on J0 in 0.03s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestFilteredSearch [junit4:junit4] Completed on J3 in 0.07s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSetupTeardownChaining [junit4:junit4] Completed on J1 in 0.02s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestCachingWrapperFilter [junit4:junit4] Completed on J0 in 0.05s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestDocIdSet [junit4:junit4] Completed on J3 in 0.06s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestFieldValueFilter [junit4:junit4] Completed on J1 in 0.02s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.store.TestFileSwitchDirectory [junit4:junit4] Completed on J0 in 0.04s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestBooleanScorer [junit4:junit4] Completed on J3 in 0.02s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestConstantScoreQuery [junit4:junit4] Completed on J1 in 0.02s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestRecyclingByteBlockAllocator [junit4:junit4] Completed on J0 in 0.01s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.store.TestDirectory [junit4:junit4] IGNOR/A 0.03s J3 | TestDirectory.testThreadSafety [junit4:junit4] Assumption #1: 'nightly' test group is disabled (@Nightly) [junit4:junit4] Completed on J3 in 0.06s, 8 tests, 1 skipped [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestCharsRef [junit4:junit4] Completed on J1 in 0.03s, 8 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestCodecReported [junit4:junit4] Completed on J0 in 0.01s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestBooleanOr [junit4:junit4] Completed on J2 in 2.55s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestParallelTermEnum [junit4:junit4] Completed on J3 in 0.07s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestElevationComparator [junit4:junit4] Completed on J1 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestExplanations [junit4:junit4] Completed on J0 in 0.01s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestFieldCacheTermsFilter [junit4:junit4] Completed on J2 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestMatchAllDocsQuery [junit4:junit4] Completed on J3 in 0.01s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestNot [junit4:junit4] Completed on J1 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestSimilarity [junit4:junit4] Completed on J0 in 0.11s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestSimilarityProvider [junit4:junit4] Completed on J2 in 0.01s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestAutomatonQueryUnicode [junit4:junit4] Completed on J3 in 0.07s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestAttributeSource [junit4:junit4] Completed on J1 in 0.02s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestTopScoreDocCollector [junit4:junit4] Completed on J0 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spans.TestSpanFirstQuery [junit4:junit4] Completed on J2 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestBytesRef [junit4:junit4]
Jenkins build is back to normal : the 4547 machine gun #155
See http://fortyounce.servebeer.com/job/the%204547%20machine%20gun/155/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3177) Excluding tagged filter in StatsComponent
[ https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolai Luthman updated SOLR-3177: -- Attachment: statsfilterexclude.patch I've made a patch for this, based on the code from the FacetComponent. The patch is for 3.6.1. Might need some cleanup to get it into the latest version. Apply by changing to solr/core/src/java/org/apache/solr/handler/component/ and run: patch statsfilterexclude.patch Excluding tagged filter in StatsComponent - Key: SOLR-3177 URL: https://issues.apache.org/jira/browse/SOLR-3177 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1 Reporter: Mathias H. Priority: Minor Labels: localparams, stats, statscomponent Attachments: statsfilterexclude.patch It would be useful to exclude the effects of some fq params from the set of documents used to compute stats -- similar to how you can exclude tagged filters when generating facet counts... https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters So that it's possible to do something like this... http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 20]q=*:*stats=truestats.field={!ex=priceFilter}price If you want to create a price slider this is very useful because then you can filter the price ([1 TO 20) and nevertheless get the lower and upper bound of the unfiltered price (min=0, max=100): {noformat} |-[---]--| $0 $1 $20$100 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4356) SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
Harish Verma created SOLR-4356: -- Summary: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs Key: SOLR-4356 URL: https://issues.apache.org/jira/browse/SOLR-4356 Project: Solr Issue Type: Improvement Components: clients - java, Schema and Analysis, Tests Affects Versions: 4.1 Environment: OS = Ubuntu 12.04 Sun JAVA 7 Max Java Heap Space = 2GB Apache Tomcat 7 Hardware = {Intel core i3, 2GB RAM} Average no of fields in a Solr Doc = 100 Reporter: Harish Verma Fix For: 4.1.1 we are testing solr 4.1 running inside tomcat 7 and java 7 with following options JAVA_OPTS=-Xms256m -Xmx2048m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+ParallelRefProcEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu/OOM_HeapDump our source code looks like following: / START */ int noOfSolrDocumentsInBatch = 0; for(int i=0 ; i5000 ; i++) { SolrInputDocument solrInputDocument = getNextSolrInputDocument(); server.add(solrInputDocument); noOfSolrDocumentsInBatch += 1; if(noOfSolrDocumentsInBatch == 10) { server.commit(); noOfSolrDocumentsInBatch = 0; } } / END */ the method getNextSolrInputDocument() generates a solr document with 100 fields (average). Around 50 of the fields are of text_general type. Some of the test_general fields consist of approx 1000 words rest consists of few words. Ouf of total fields there are around 35-40 multivalued fields (not of type text_general). We are indexing all the fields but storing only 8 fields. Out of these 8 fields two are string type, five are long and one is boolean. So our index size is only 394 MB. But the RAM occupied at time of OOM is around 2.5 GB. Why the memory is so high even though the index size is small? What is being stored in the memory? Our understanding is that after every commit documents are flushed to the disk.So nothing should remain in RAM after commit. We are using the following settings: server.commit() set waitForSearcher=true and waitForFlush=true solrConfig.xml has following properties set: directoryFactory = solr.MMapDirectoryFactory maxWarmingSearchers = 1 text_general data type is being used as supplied in the schema.xml with the solr setup. maxIndexingThreads = 8(default) autoCommitmaxTime15000/maxTimeopenSearcherfalse/openSearcher/autoCommit We get Java heap Out Of Memory Error after commiting around 3990 solr documents.Some of the snapshots of memory dump from profiler are attached. can somebody please suggest what should we do to minimize/optimize the memory consumption in our case with the reasons? also suggest what should be optimal values and reason for following parameters of solrConfig.xml useColdSearcher - true/false? maxwarmingsearchers- number spellcheck-on/off? omitNorms=true/false? omitTermFreqAndPositions? mergefactor? we are using default value 10 java garbage collection tuning parameters ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4352) Velocity-base pagination should support/preserve sorting
[ https://issues.apache.org/jira/browse/SOLR-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-4352: --- Attachment: SOLR-4352-erik.patch Eric - how about this patch? It allows the sort parameter(s) to stick around on facet selections as well, not just pagination. Velocity-base pagination should support/preserve sorting Key: SOLR-4352 URL: https://issues.apache.org/jira/browse/SOLR-4352 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Eric Spiegelberg Assignee: Erik Hatcher Attachments: SOLR-4352-erik.patch, SOLR-4352.patch When performing /browse, the Velocity generated UI does not support sorting in the generated pagination links. The link_to_previous_page and link_to_next_page macros found within [apache-solr-4.0.0]/example/solr/collection1/conf/velocity/VM_global_library.vm should be modified to maintain/preserve an existing sort parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4357) Default field in query syntax documentation has confusing error
Hayden Muhl created SOLR-4357: - Summary: Default field in query syntax documentation has confusing error Key: SOLR-4357 URL: https://issues.apache.org/jira/browse/SOLR-4357 Project: Solr Issue Type: Bug Components: documentation Affects Versions: 4.0 Reporter: Hayden Muhl Priority: Trivial Fix For: 4.0.1 The explanation of default search fields uses two different queries that are supposed to be semantically the same, but the query text changes between the two examples. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4357) Default field in query syntax documentation has confusing error
[ https://issues.apache.org/jira/browse/SOLR-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hayden Muhl updated SOLR-4357: -- Attachment: SOLR-4357.patch Small fix for documentation. Default field in query syntax documentation has confusing error --- Key: SOLR-4357 URL: https://issues.apache.org/jira/browse/SOLR-4357 Project: Solr Issue Type: Bug Components: documentation Affects Versions: 4.0 Reporter: Hayden Muhl Priority: Trivial Labels: documentation Fix For: 4.0.1 Attachments: SOLR-4357.patch Original Estimate: 5m Remaining Estimate: 5m The explanation of default search fields uses two different queries that are supposed to be semantically the same, but the query text changes between the two examples. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Moved] (LUCENE-4718) Default field in query syntax documentation has confusing error
[ https://issues.apache.org/jira/browse/LUCENE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher moved SOLR-4357 to LUCENE-4718: Component/s: (was: documentation) core/queryparser Fix Version/s: (was: 4.0.1) 4.0.1 Lucene Fields: New,Patch Available Affects Version/s: (was: 4.0) 4.0 Key: LUCENE-4718 (was: SOLR-4357) Project: Lucene - Core (was: Solr) Default field in query syntax documentation has confusing error --- Key: LUCENE-4718 URL: https://issues.apache.org/jira/browse/LUCENE-4718 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.0 Reporter: Hayden Muhl Priority: Trivial Labels: documentation Fix For: 4.0.1 Attachments: SOLR-4357.patch Original Estimate: 5m Remaining Estimate: 5m The explanation of default search fields uses two different queries that are supposed to be semantically the same, but the query text changes between the two examples. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-4356) SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
[ https://issues.apache.org/jira/browse/SOLR-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Verma updated SOLR-4356: --- Comment: was deleted (was: screenshots of Memory Dump) SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs - Key: SOLR-4356 URL: https://issues.apache.org/jira/browse/SOLR-4356 Project: Solr Issue Type: Improvement Components: clients - java, Schema and Analysis, Tests Affects Versions: 4.1 Environment: OS = Ubuntu 12.04 Sun JAVA 7 Max Java Heap Space = 2GB Apache Tomcat 7 Hardware = {Intel core i3, 2GB RAM} Average no of fields in a Solr Doc = 100 Reporter: Harish Verma Labels: performance, test Fix For: 4.1.1 Attachments: memorydump1.png, memorydump2.png Original Estimate: 168h Remaining Estimate: 168h we are testing solr 4.1 running inside tomcat 7 and java 7 with following options JAVA_OPTS=-Xms256m -Xmx2048m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+ParallelRefProcEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu/OOM_HeapDump our source code looks like following: / START */ int noOfSolrDocumentsInBatch = 0; for(int i=0 ; i5000 ; i++) { SolrInputDocument solrInputDocument = getNextSolrInputDocument(); server.add(solrInputDocument); noOfSolrDocumentsInBatch += 1; if(noOfSolrDocumentsInBatch == 10) { server.commit(); noOfSolrDocumentsInBatch = 0; } } / END */ the method getNextSolrInputDocument() generates a solr document with 100 fields (average). Around 50 of the fields are of text_general type. Some of the test_general fields consist of approx 1000 words rest consists of few words. Ouf of total fields there are around 35-40 multivalued fields (not of type text_general). We are indexing all the fields but storing only 8 fields. Out of these 8 fields two are string type, five are long and one is boolean. So our index size is only 394 MB. But the RAM occupied at time of OOM is around 2.5 GB. Why the memory is so high even though the index size is small? What is being stored in the memory? Our understanding is that after every commit documents are flushed to the disk.So nothing should remain in RAM after commit. We are using the following settings: server.commit() set waitForSearcher=true and waitForFlush=true solrConfig.xml has following properties set: directoryFactory = solr.MMapDirectoryFactory maxWarmingSearchers = 1 text_general data type is being used as supplied in the schema.xml with the solr setup. maxIndexingThreads = 8(default) autoCommitmaxTime15000/maxTimeopenSearcherfalse/openSearcher/autoCommit We get Java heap Out Of Memory Error after commiting around 3990 solr documents.Some of the snapshots of memory dump from profiler are attached. can somebody please suggest what should we do to minimize/optimize the memory consumption in our case with the reasons? also suggest what should be optimal values and reason for following parameters of solrConfig.xml useColdSearcher - true/false? maxwarmingsearchers- number spellcheck-on/off? omitNorms=true/false? omitTermFreqAndPositions? mergefactor? we are using default value 10 java garbage collection tuning parameters ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4356) SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
[ https://issues.apache.org/jira/browse/SOLR-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Verma updated SOLR-4356: --- Attachment: memorydump2.png memorydump1.png screenshots of Memory Dump SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs - Key: SOLR-4356 URL: https://issues.apache.org/jira/browse/SOLR-4356 Project: Solr Issue Type: Improvement Components: clients - java, Schema and Analysis, Tests Affects Versions: 4.1 Environment: OS = Ubuntu 12.04 Sun JAVA 7 Max Java Heap Space = 2GB Apache Tomcat 7 Hardware = {Intel core i3, 2GB RAM} Average no of fields in a Solr Doc = 100 Reporter: Harish Verma Labels: performance, test Fix For: 4.1.1 Attachments: memorydump1.png, memorydump2.png Original Estimate: 168h Remaining Estimate: 168h we are testing solr 4.1 running inside tomcat 7 and java 7 with following options JAVA_OPTS=-Xms256m -Xmx2048m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+ParallelRefProcEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu/OOM_HeapDump our source code looks like following: / START */ int noOfSolrDocumentsInBatch = 0; for(int i=0 ; i5000 ; i++) { SolrInputDocument solrInputDocument = getNextSolrInputDocument(); server.add(solrInputDocument); noOfSolrDocumentsInBatch += 1; if(noOfSolrDocumentsInBatch == 10) { server.commit(); noOfSolrDocumentsInBatch = 0; } } / END */ the method getNextSolrInputDocument() generates a solr document with 100 fields (average). Around 50 of the fields are of text_general type. Some of the test_general fields consist of approx 1000 words rest consists of few words. Ouf of total fields there are around 35-40 multivalued fields (not of type text_general). We are indexing all the fields but storing only 8 fields. Out of these 8 fields two are string type, five are long and one is boolean. So our index size is only 394 MB. But the RAM occupied at time of OOM is around 2.5 GB. Why the memory is so high even though the index size is small? What is being stored in the memory? Our understanding is that after every commit documents are flushed to the disk.So nothing should remain in RAM after commit. We are using the following settings: server.commit() set waitForSearcher=true and waitForFlush=true solrConfig.xml has following properties set: directoryFactory = solr.MMapDirectoryFactory maxWarmingSearchers = 1 text_general data type is being used as supplied in the schema.xml with the solr setup. maxIndexingThreads = 8(default) autoCommitmaxTime15000/maxTimeopenSearcherfalse/openSearcher/autoCommit We get Java heap Out Of Memory Error after commiting around 3990 solr documents.Some of the snapshots of memory dump from profiler are attached. can somebody please suggest what should we do to minimize/optimize the memory consumption in our case with the reasons? also suggest what should be optimal values and reason for following parameters of solrConfig.xml useColdSearcher - true/false? maxwarmingsearchers- number spellcheck-on/off? omitNorms=true/false? omitTermFreqAndPositions? mergefactor? we are using default value 10 java garbage collection tuning parameters ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Jenkins: the 4547 machine gun #53
On Fri, Jan 25, 2013 at 2:30 AM, Robert Muir rcm...@gmail.com wrote: I think the bug is when its direct and the last block has the optimized bitsPerValue=0 case. Right. This was a test bug due to the fact that the reader moves the file pointer in the for (i = 0; i valueCount; ++i) { assert blah } loop. I committed a fix. Thanks for running tests on this branch! -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: the 4547 machine gun #206
See http://fortyounce.servebeer.com/job/the%204547%20machine%20gun/206/ -- [...truncated 966 lines...] [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestPayloads [junit4:junit4] Completed on J1 in 0.22s, 7 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestFuzzyQuery [junit4:junit4] Completed on J2 in 0.06s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestOmitPositions [junit4:junit4] Completed on J0 in 0.32s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestRollingBuffer [junit4:junit4] Completed on J3 in 0.06s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSystemPropertiesInvariantRule [junit4:junit4] Completed on J2 in 0.07s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestSizeBoundedForceMerge [junit4:junit4] Completed on J0 in 0.06s, 11 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestWildcardRandom [junit4:junit4] Completed on J3 in 0.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestRegexpQuery [junit4:junit4] Completed on J2 in 0.04s, 7 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spans.TestNearSpansOrdered [junit4:junit4] Completed on J0 in 0.25s, 10 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.automaton.TestSpecialOperations [junit4:junit4] Completed on J3 in 0.23s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestDocCount [junit4:junit4] Completed on J2 in 0.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestSumDocFreq [junit4:junit4] Completed on J0 in 0.11s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestPerSegmentDeletes [junit4:junit4] Completed on J2 in 0.08s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestSmallFloat [junit4:junit4] Completed on J0 in 0.05s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestParallelReaderEmptyIndex [junit4:junit4] Completed on J2 in 0.08s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterConfig [junit4:junit4] Completed on J0 in 0.04s, 9 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.TestSearchForDuplicates [junit4:junit4] Completed on J3 in 0.61s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestSetOnce [junit4:junit4] Completed on J2 in 0.02s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestFilteredSearch [junit4:junit4] Completed on J0 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestNoDeletionPolicy [junit4:junit4] Completed on J3 in 0.18s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSetupTeardownChaining [junit4:junit4] Completed on J2 in 0.02s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSameRandomnessLocalePassedOrNot [junit4:junit4] Completed on J0 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestBooleanOr [junit4:junit4] Completed on J1 in 1.95s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestSubScorerFreqs [junit4:junit4] Completed on J3 in 0.02s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestDateFilter [junit4:junit4] Completed on J2 in 0.02s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSeedFromUncaught [junit4:junit4] Completed on J0 in 0.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spans.TestSpansAdvanced [junit4:junit4] Completed on J1 in 0.10s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestBooleanScorer [junit4:junit4] Completed on J3 in 0.08s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestPhrasePrefixQuery [junit4:junit4] Completed on J2 in 0.01s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestRecyclingByteBlockAllocator [junit4:junit4] Completed on J0 in 0.03s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.document.TestDateTools [junit4:junit4] Completed on J1 in 0.04s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestJUnitRuleOrder [junit4:junit4] Completed on J3 in 0.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestCodecReported [junit4:junit4] Completed on J2 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestReaderClosed [junit4:junit4] Completed on J0 in 0.13s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestMatchAllDocsQuery [junit4:junit4] Completed on J1 in 0.22s, 2 tests [junit4:junit4] [junit4:junit4] Suite:
Jenkins build is back to normal : the 4547 machine gun #207
See http://fortyounce.servebeer.com/job/the%204547%20machine%20gun/207/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4716) Add OR support to DrillDown
[ https://issues.apache.org/jira/browse/LUCENE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562648#comment-13562648 ] Commit Tag Bot commented on LUCENE-4716: [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revisionrevision=1438485 LUCENE-4716: Add OR support to DrillDown Add OR support to DrillDown --- Key: LUCENE-4716 URL: https://issues.apache.org/jira/browse/LUCENE-4716 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4716.patch DrillDown provides helper methods to wrap a baseQuery with drill-down categories. All the categories are AND'ed, and it has been asked on the user list for OR support. While users can construct their own BooleanQuery, it would be useful if DrillDown helped them doing that. I think that a simple Occur additional parameter to DrillDown.query will help to some extent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4716) Add OR support to DrillDown
[ https://issues.apache.org/jira/browse/LUCENE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-4716. Resolution: Fixed Fix Version/s: 5.0 4.2 Lucene Fields: New,Patch Available (was: New) Committed to trunk and 4x Add OR support to DrillDown --- Key: LUCENE-4716 URL: https://issues.apache.org/jira/browse/LUCENE-4716 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.2, 5.0 Attachments: LUCENE-4716.patch DrillDown provides helper methods to wrap a baseQuery with drill-down categories. All the categories are AND'ed, and it has been asked on the user list for OR support. While users can construct their own BooleanQuery, it would be useful if DrillDown helped them doing that. I think that a simple Occur additional parameter to DrillDown.query will help to some extent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4716) Add OR support to DrillDown
[ https://issues.apache.org/jira/browse/LUCENE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562661#comment-13562661 ] Commit Tag Bot commented on LUCENE-4716: [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revisionrevision=1438491 LUCENE-4716: Add OR support to DrillDown Add OR support to DrillDown --- Key: LUCENE-4716 URL: https://issues.apache.org/jira/browse/LUCENE-4716 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.2, 5.0 Attachments: LUCENE-4716.patch DrillDown provides helper methods to wrap a baseQuery with drill-down categories. All the categories are AND'ed, and it has been asked on the user list for OR support. While users can construct their own BooleanQuery, it would be useful if DrillDown helped them doing that. I think that a simple Occur additional parameter to DrillDown.query will help to some extent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Jenkins: the 4547 machine gun #206
I'll dig. Mike McCandless http://blog.mikemccandless.com On Fri, Jan 25, 2013 at 7:46 AM, Charlie Cron hudsonsevilt...@gmail.com wrote: See http://fortyounce.servebeer.com/job/the%204547%20machine%20gun/206/ -- [...truncated 966 lines...] [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestPayloads [junit4:junit4] Completed on J1 in 0.22s, 7 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestFuzzyQuery [junit4:junit4] Completed on J2 in 0.06s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestOmitPositions [junit4:junit4] Completed on J0 in 0.32s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestRollingBuffer [junit4:junit4] Completed on J3 in 0.06s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSystemPropertiesInvariantRule [junit4:junit4] Completed on J2 in 0.07s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestSizeBoundedForceMerge [junit4:junit4] Completed on J0 in 0.06s, 11 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestWildcardRandom [junit4:junit4] Completed on J3 in 0.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestRegexpQuery [junit4:junit4] Completed on J2 in 0.04s, 7 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spans.TestNearSpansOrdered [junit4:junit4] Completed on J0 in 0.25s, 10 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.automaton.TestSpecialOperations [junit4:junit4] Completed on J3 in 0.23s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestDocCount [junit4:junit4] Completed on J2 in 0.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestSumDocFreq [junit4:junit4] Completed on J0 in 0.11s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestPerSegmentDeletes [junit4:junit4] Completed on J2 in 0.08s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestSmallFloat [junit4:junit4] Completed on J0 in 0.05s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestParallelReaderEmptyIndex [junit4:junit4] Completed on J2 in 0.08s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestIndexWriterConfig [junit4:junit4] Completed on J0 in 0.04s, 9 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.TestSearchForDuplicates [junit4:junit4] Completed on J3 in 0.61s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestSetOnce [junit4:junit4] Completed on J2 in 0.02s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestFilteredSearch [junit4:junit4] Completed on J0 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestNoDeletionPolicy [junit4:junit4] Completed on J3 in 0.18s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSetupTeardownChaining [junit4:junit4] Completed on J2 in 0.02s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSameRandomnessLocalePassedOrNot [junit4:junit4] Completed on J0 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestBooleanOr [junit4:junit4] Completed on J1 in 1.95s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestSubScorerFreqs [junit4:junit4] Completed on J3 in 0.02s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestDateFilter [junit4:junit4] Completed on J2 in 0.02s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestSeedFromUncaught [junit4:junit4] Completed on J0 in 0.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spans.TestSpansAdvanced [junit4:junit4] Completed on J1 in 0.10s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestBooleanScorer [junit4:junit4] Completed on J3 in 0.08s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.TestPhrasePrefixQuery [junit4:junit4] Completed on J2 in 0.01s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.TestRecyclingByteBlockAllocator [junit4:junit4] Completed on J0 in 0.03s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.document.TestDateTools [junit4:junit4] Completed on J1 in 0.04s, 5 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestJUnitRuleOrder [junit4:junit4] Completed on J3 in 0.03s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.util.junitcompat.TestCodecReported [junit4:junit4] Completed on J2 in 0.02s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.index.TestReaderClosed [junit4:junit4] Completed on J0 in 0.13s, 2
[jira] [Commented] (SOLR-4354) Replication should perform full copy if slave's generation higher than master's
[ https://issues.apache.org/jira/browse/SOLR-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562690#comment-13562690 ] Shalin Shekhar Mangar commented on SOLR-4354: - Amit, I don't complete understand the problem. bq. Slave now tries to pull from master B (has higher index version than slave but lower generation) Say, slave has generation G and version V and master(B) has a higher version V+1 but lower generation G-1. The code right now says: {code} boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion - || commit.getGeneration() = latestGeneration || forceReplication; {code} Since master's generation is lower than slave, a full copy will be forced here. Further, your patch has: {code} - || commit.getGeneration() = latestGeneration || forceReplication; + || commit.getGeneration() = latestGeneration || (commit.getGeneration() latestGeneration) || forceReplication; {code} I don't see how that changes anything. The second condition on generation is redundant. Did I miss something? Replication should perform full copy if slave's generation higher than master's --- Key: SOLR-4354 URL: https://issues.apache.org/jira/browse/SOLR-4354 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.1 Reporter: Amit Nithian Fix For: 4.2 Attachments: SOLR-4354.patch Original Estimate: 1h Remaining Estimate: 1h We have dual masters each incrementally indexing from our MySQL database and sit behind a virtual hostname in our load balancer. As such, it's possible that the generation numbers between the masters for a given index are not in sync. Slaves are configured to replicate from this virtual host (and pin based on source/dest IP hash) so we can add and remove masters as necessary (great for maintenance). For the most part this works but we've seen the following happen: * Slave has been pulling from master A * Master A goes down for maint and now will pull from master B (which has a lower generation number for some reason than master A). * Slave now tries to pull from master B (has higher index version than slave but lower generation). * Slave downloads index files, moves them to the index/ directory but these files are deleted during the doCommit() phase (looks like older generation data is deleted). * Index remains as-is and no change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4708) Make LZ4 hash tables reusable
[ https://issues.apache.org/jira/browse/LUCENE-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562693#comment-13562693 ] Commit Tag Bot commented on LUCENE-4708: [trunk commit] Adrien Grand http://svn.apache.org/viewvc?view=revisionrevision=1438519 LUCENE-4708: Reuse LZ4 hash tables across calls. Make LZ4 hash tables reusable - Key: LUCENE-4708 URL: https://issues.apache.org/jira/browse/LUCENE-4708 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4708.patch Currently LZ4 compressors instantiate their own hash table for every byte sequence they need to compress. These can be large (256KB for LZ4 HC) so we should try to reuse them across calls. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4708) Make LZ4 hash tables reusable
[ https://issues.apache.org/jira/browse/LUCENE-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562705#comment-13562705 ] Commit Tag Bot commented on LUCENE-4708: [branch_4x commit] Adrien Grand http://svn.apache.org/viewvc?view=revisionrevision=1438524 LUCENE-4708: Reuse LZ4 hash tables across calls (merged from r1438519). Make LZ4 hash tables reusable - Key: LUCENE-4708 URL: https://issues.apache.org/jira/browse/LUCENE-4708 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4708.patch Currently LZ4 compressors instantiate their own hash table for every byte sequence they need to compress. These can be large (256KB for LZ4 HC) so we should try to reuse them across calls. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4708) Make LZ4 hash tables reusable
[ https://issues.apache.org/jira/browse/LUCENE-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-4708. -- Resolution: Fixed Make LZ4 hash tables reusable - Key: LUCENE-4708 URL: https://issues.apache.org/jira/browse/LUCENE-4708 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4708.patch Currently LZ4 compressors instantiate their own hash table for every byte sequence they need to compress. These can be large (256KB for LZ4 HC) so we should try to reuse them across calls. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fixing query-time multi-word synonym issue
Here's an example query with q.op=AND: causes of heart attack And I have this synonym definition: heart attack, myocardial infarction So, what is the alleged query parser fix so that the query is treated as: causes of (heart attack OR myocardial infarction) The core problem with the synonym filter is that it mashes all the terms of a multi-term synonym to be at the same position so that the order (attack after heart and infarction after mycardial) is lost. What is needed is a synonym filter with a notion of path so the term sequences for each of the synonym alternatives is available for the query parser to generate the OR alternative queries. Granted, the query parser ALSO needs to present the full sequence of terms to the analyzer as one string causes of heart attack, but that alone doesn't address the synonym filter misbehavior. -- Jack Krupansky -Original Message- From: Robert Muir Sent: Friday, January 25, 2013 3:46 AM To: dev@lucene.apache.org Subject: Re: Fixing query-time multi-word synonym issue On Fri, Jan 25, 2013 at 12:48 AM, Jack Krupansky j...@basetechnology.com wrote: Otis, this is precisely why nothing will get done any time soon on the multi-term synonym issue - there isn't even common agreement that there is a problem, let alone common agreement on the specifics of the problem, let alone common agreement on a solution. I think you are the only one arguing the bug is a synonymsfilter problem. Even though technically the Solr Query Parser is now separate from the Lucene Query Parser, the synonym filter is still strictly Lucene. Addressing the multi-term synonym feature requires enhancement to the synonym filter, dude you have a bug in X, you fix the bug in X: you dont go hack around it in Y. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4358) SolrJ, by preventing multi-part post, loses key information about file name that Tika needs
Karl Wright created SOLR-4358: - Summary: SolrJ, by preventing multi-part post, loses key information about file name that Tika needs Key: SOLR-4358 URL: https://issues.apache.org/jira/browse/SOLR-4358 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.0 Reporter: Karl Wright SolrJ accepts a ContentStream, which has a name field. Within HttpSolrServer.java, if SolrJ makes the decision to use multipart posts, this filename is transmitted as part of the form boundary information. However, if SolrJ chooses not to use multipart post, the filename information is lost. This information is used by SolrCell (Tika) to make decisions about content extraction, so it is very important that it makes it into Solr in one way or another. Either SolrJ should set appropriate equivalent headers to send the filename automatically, or it should force multipart posts when this information is present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4356) SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
[ https://issues.apache.org/jira/browse/SOLR-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562734#comment-13562734 ] Mark Miller commented on SOLR-4356: --- Please send questions like this to the user list - then open a JIRA issue if you determine the issue is a bug. SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs - Key: SOLR-4356 URL: https://issues.apache.org/jira/browse/SOLR-4356 Project: Solr Issue Type: Improvement Components: clients - java, Schema and Analysis, Tests Affects Versions: 4.1 Environment: OS = Ubuntu 12.04 Sun JAVA 7 Max Java Heap Space = 2GB Apache Tomcat 7 Hardware = {Intel core i3, 2GB RAM} Average no of fields in a Solr Doc = 100 Reporter: Harish Verma Labels: performance, test Fix For: 4.1.1 Attachments: memorydump1.png, memorydump2.png Original Estimate: 168h Remaining Estimate: 168h we are testing solr 4.1 running inside tomcat 7 and java 7 with following options JAVA_OPTS=-Xms256m -Xmx2048m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+ParallelRefProcEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu/OOM_HeapDump our source code looks like following: / START */ int noOfSolrDocumentsInBatch = 0; for(int i=0 ; i5000 ; i++) { SolrInputDocument solrInputDocument = getNextSolrInputDocument(); server.add(solrInputDocument); noOfSolrDocumentsInBatch += 1; if(noOfSolrDocumentsInBatch == 10) { server.commit(); noOfSolrDocumentsInBatch = 0; } } / END */ the method getNextSolrInputDocument() generates a solr document with 100 fields (average). Around 50 of the fields are of text_general type. Some of the test_general fields consist of approx 1000 words rest consists of few words. Ouf of total fields there are around 35-40 multivalued fields (not of type text_general). We are indexing all the fields but storing only 8 fields. Out of these 8 fields two are string type, five are long and one is boolean. So our index size is only 394 MB. But the RAM occupied at time of OOM is around 2.5 GB. Why the memory is so high even though the index size is small? What is being stored in the memory? Our understanding is that after every commit documents are flushed to the disk.So nothing should remain in RAM after commit. We are using the following settings: server.commit() set waitForSearcher=true and waitForFlush=true solrConfig.xml has following properties set: directoryFactory = solr.MMapDirectoryFactory maxWarmingSearchers = 1 text_general data type is being used as supplied in the schema.xml with the solr setup. maxIndexingThreads = 8(default) autoCommitmaxTime15000/maxTimeopenSearcherfalse/openSearcher/autoCommit We get Java heap Out Of Memory Error after commiting around 3990 solr documents.Some of the snapshots of memory dump from profiler are attached. can somebody please suggest what should we do to minimize/optimize the memory consumption in our case with the reasons? also suggest what should be optimal values and reason for following parameters of solrConfig.xml useColdSearcher - true/false? maxwarmingsearchers- number spellcheck-on/off? omitNorms=true/false? omitTermFreqAndPositions? mergefactor? we are using default value 10 java garbage collection tuning parameters ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fixing query-time multi-word synonym issue
On Fri, Jan 25, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com wrote: Here's an example query with q.op=AND: causes of heart attack And I have this synonym definition: heart attack, myocardial infarction So, what is the alleged query parser fix so that the query is treated as: causes of (heart attack OR myocardial infarction) Thats actually inefficient and stupid to do. if you make a parser that doesnt split on whitespace, you can just tell it to fold at index and query time just like stemming. no OR necessary. But I think you are trying to get off topic, again the real problem affecting 99%+ users is that the lucene queryparser splits on whitespace. If this is fixed, then lots of things (not just synonyms, but other basic shit that is broken today) starts working too: https://issues.apache.org/jira/browse/LUCENE-2605 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562740#comment-13562740 ] Renaud Delbru commented on LUCENE-4642: --- Hi, are there still some open questions on this issue that block the patch of being committed ? TokenizerFactory should provide a create method with a given AttributeSource Key: LUCENE-4642 URL: https://issues.apache.org/jira/browse/LUCENE-4642 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.1 Reporter: Renaud Delbru Assignee: Steve Rowe Labels: analysis, attribute, tokenizer Fix For: 4.2, 5.0 Attachments: LUCENE-4642.patch, LUCENE-4642.patch All tokenizer implementations have a constructor that takes a given AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory does not provide an API to create tokenizers with a given AttributeSource. Side note: There are still a lot of tokenizers that do not provide constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4043) Add ability to get success/failure responses from Collections API.
[ https://issues.apache.org/jira/browse/SOLR-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562741#comment-13562741 ] Mark Miller commented on SOLR-4043: --- I'm going to commit this in a moment so it can start baking for 4.2. Add ability to get success/failure responses from Collections API. -- Key: SOLR-4043 URL: https://issues.apache.org/jira/browse/SOLR-4043 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud cluster Reporter: Raintung Li Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: patch-4043.txt, SOLR-4043_brach4.x.txt, SOLR-4043.patch The create/delete/reload collections are asynchronous process, the client can't get the right response, only make sure the information have been saved into the OverseerCollectionQueue. The client will get the response directly that don't wait the result of behavior(create/delete/reload collection) whatever successful. The easy solution is client wait until the asynchronous process success, the create/delete/reload collection thread will save the response into OverseerCollectionQueue, then notify client to get the response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4328) Simultaneous multiple connections to Solr example often fail with various IOExceptions
[ https://issues.apache.org/jira/browse/SOLR-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562743#comment-13562743 ] Karl Wright commented on SOLR-4328: --- The Solr connector in the ManifoldCF project worked around this problem for the moment by doing two things: (1) Detecting the broken pipe error and interpreting that as meaning that a fixed number of retries are required; (2) Turning on stale connection checks in HttpClient. This is workable but not ideal. The fact that Solr forcibly closes connections means that connection pooling on the client side is essentially a futile effort, and thus there are significant performance losses going to be associated with this behavior. It is therefore in everyone's interest, I believe, to get Solr to stop doing what it is doing. If I get any time this weekend I will try and propose a patch. Simultaneous multiple connections to Solr example often fail with various IOExceptions -- Key: SOLR-4328 URL: https://issues.apache.org/jira/browse/SOLR-4328 Project: Solr Issue Type: Bug Affects Versions: 4.0, 3.6.2 Environment: ManifoldCF, Solr connector, SolrJ, and Solr 4.0 or 3.6 on Mac OSX or Ubuntu, all localhost connections Reporter: Karl Wright In ManifoldCF, we've been seeing problems with SolrJ connections throwing java.net.SocketException's. See CONNECTORS-616 for details as to exactly what varieties of this exception are thrown, but broken pipe is the most common. This occurs on multiple Unix variants as stated. (We also occasionally see exceptions on Windows, but they are much less frequent and are different variants than on Unix.) The exceptions seem to occur during the time an initial connection is getting established, and seems to occur randomly when multiple connections are getting established all at the same time. Wire logging shows that only the first few headers are sent before the connection is broken. Solr itself does log any error. A retry is usually sufficient to have the transaction succeed. The Solr Connector in ManifoldCF has recently been upgraded to rely on SolrJ, which could be a complicating factor. However, I have repeatedly audited both the Solr Connection code and the SolrJ code for best practices, and while I found a couple of problems, nothing seems to be of the sort that could cause a broken pipe. For that to happen, the socket must be closed either on the client end or on the server end, and there appears to be no mechanism for that happening on the client end, since multiple threads would have to be working with the same socket for that to be a possibility. It is also true that in ManifoldCF we disable the automatic retries that are normally enabled for HttpComponents HttpClient. These automatic retries likely mask this problem should it be occurring in other situations. Places where there could potentially be a bug, in order of likelihood: (1) Jetty. Nobody I am aware of has seen this on Tomcat yet. But I also don't know if anyone has tried it. (2) Solr servlet. If it is possible for a servlet implementation to cause the connection to drop without any exception being generated, this would be something that should be researched. (3) HttpComponents/HttpClient. If there is a client-side issue, it would have to be because an httpclient instance was closing sockets from other instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562745#comment-13562745 ] Robert Muir commented on LUCENE-4642: - I raised a lot of questions. I think they are valid concerns. TokenizerFactory should provide a create method with a given AttributeSource Key: LUCENE-4642 URL: https://issues.apache.org/jira/browse/LUCENE-4642 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.1 Reporter: Renaud Delbru Assignee: Steve Rowe Labels: analysis, attribute, tokenizer Fix For: 4.2, 5.0 Attachments: LUCENE-4642.patch, LUCENE-4642.patch All tokenizer implementations have a constructor that takes a given AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory does not provide an API to create tokenizers with a given AttributeSource. Side note: There are still a lot of tokenizers that do not provide constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4043) Add ability to get success/failure responses from Collections API.
[ https://issues.apache.org/jira/browse/SOLR-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562759#comment-13562759 ] Commit Tag Bot commented on SOLR-4043: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1438550 SOLR-4043: Add ability to get success/failure responses from Collections API. Add ability to get success/failure responses from Collections API. -- Key: SOLR-4043 URL: https://issues.apache.org/jira/browse/SOLR-4043 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud cluster Reporter: Raintung Li Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: patch-4043.txt, SOLR-4043_brach4.x.txt, SOLR-4043.patch The create/delete/reload collections are asynchronous process, the client can't get the right response, only make sure the information have been saved into the OverseerCollectionQueue. The client will get the response directly that don't wait the result of behavior(create/delete/reload collection) whatever successful. The easy solution is client wait until the asynchronous process success, the create/delete/reload collection thread will save the response into OverseerCollectionQueue, then notify client to get the response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4352) Velocity-base pagination should support/preserve sorting
[ https://issues.apache.org/jira/browse/SOLR-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562760#comment-13562760 ] Eric Spiegelberg commented on SOLR-4352: This patch is specifically for maintaining the sort parameter(s) for pagination -- the Velocity template that generate the pagination links was modified. Very similar code of how to extract and maintain the sort parameter(s) could be applied to facet selections separately. Velocity-base pagination should support/preserve sorting Key: SOLR-4352 URL: https://issues.apache.org/jira/browse/SOLR-4352 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Eric Spiegelberg Assignee: Erik Hatcher Attachments: SOLR-4352-erik.patch, SOLR-4352.patch When performing /browse, the Velocity generated UI does not support sorting in the generated pagination links. The link_to_previous_page and link_to_next_page macros found within [apache-solr-4.0.0]/example/solr/collection1/conf/velocity/VM_global_library.vm should be modified to maintain/preserve an existing sort parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4719) Payloads per position broken
André created LUCENE-4719: -- Summary: Payloads per position broken Key: LUCENE-4719 URL: https://issues.apache.org/jira/browse/LUCENE-4719 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.1 Reporter: André In 4.0 it worked. Since 4.1 getPayload() returns the same ByteRef instance for every position of the same term. Additionally payloads stored on the term vector (correct) may differ form payloads stored in the postings (wrong). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562765#comment-13562765 ] Steve Rowe commented on LUCENE-4642: Renaud, have you looked at [TeeSinkTokenFilter|http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html]? Sounds to me like a good fit for the use case you mentioned. TokenizerFactory should provide a create method with a given AttributeSource Key: LUCENE-4642 URL: https://issues.apache.org/jira/browse/LUCENE-4642 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.1 Reporter: Renaud Delbru Assignee: Steve Rowe Labels: analysis, attribute, tokenizer Fix For: 4.2, 5.0 Attachments: LUCENE-4642.patch, LUCENE-4642.patch All tokenizer implementations have a constructor that takes a given AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory does not provide an API to create tokenizers with a given AttributeSource. Side note: There are still a lot of tokenizers that do not provide constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4352) Velocity-base pagination should support/preserve sorting
[ https://issues.apache.org/jira/browse/SOLR-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562769#comment-13562769 ] Erik Hatcher commented on SOLR-4352: Eric - my patch covers both facet and pagination links. Any reason not to keep sort on facet links too? Thoughts on my patch for your needs? Velocity-base pagination should support/preserve sorting Key: SOLR-4352 URL: https://issues.apache.org/jira/browse/SOLR-4352 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Eric Spiegelberg Assignee: Erik Hatcher Attachments: SOLR-4352-erik.patch, SOLR-4352.patch When performing /browse, the Velocity generated UI does not support sorting in the generated pagination links. The link_to_previous_page and link_to_next_page macros found within [apache-solr-4.0.0]/example/solr/collection1/conf/velocity/VM_global_library.vm should be modified to maintain/preserve an existing sort parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4719) Payloads per position broken
[ https://issues.apache.org/jira/browse/LUCENE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] André updated LUCENE-4719: --- Fix Version/s: 4.1.1 Payloads per position broken Key: LUCENE-4719 URL: https://issues.apache.org/jira/browse/LUCENE-4719 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.1 Reporter: André Fix For: 4.1.1 In 4.0 it worked. Since 4.1 getPayload() returns the same ByteRef instance for every position of the same term. Additionally payloads stored on the term vector (correct) may differ form payloads stored in the postings (wrong). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4352) Velocity-base pagination should support/preserve sorting
[ https://issues.apache.org/jira/browse/SOLR-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562771#comment-13562771 ] Erik Hatcher commented on SOLR-4352: In your original patch, Eric, it doesn't account for multiple sort parameters nor does it URL encode the sort values. Both multiple sort and url encoding are handled in my patch. Velocity-base pagination should support/preserve sorting Key: SOLR-4352 URL: https://issues.apache.org/jira/browse/SOLR-4352 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Eric Spiegelberg Assignee: Erik Hatcher Attachments: SOLR-4352-erik.patch, SOLR-4352.patch When performing /browse, the Velocity generated UI does not support sorting in the generated pagination links. The link_to_previous_page and link_to_next_page macros found within [apache-solr-4.0.0]/example/solr/collection1/conf/velocity/VM_global_library.vm should be modified to maintain/preserve an existing sort parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4328) Simultaneous multiple connections to Solr example often fail with various IOExceptions
[ https://issues.apache.org/jira/browse/SOLR-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated SOLR-4328: -- Description: In ManifoldCF, we've been seeing problems with SolrJ connections throwing java.net.SocketException's. See CONNECTORS-616 for details as to exactly what varieties of this exception are thrown, but broken pipe is the most common. This occurs on multiple Unix variants as stated. (We also occasionally see exceptions on Windows, but they are much less frequent and are different variants than on Unix.) The exceptions seem to occur during the time an initial connection is getting established, and seems to occur randomly when multiple connections are getting established all at the same time. Wire logging shows that only the first few headers are sent before the connection is broken. Solr itself does not log any error. A retry is usually sufficient to have the transaction succeed. The Solr Connector in ManifoldCF has recently been upgraded to rely on SolrJ, which could be a complicating factor. However, I have repeatedly audited both the Solr Connection code and the SolrJ code for best practices, and while I found a couple of problems, nothing seems to be of the sort that could cause a broken pipe. For that to happen, the socket must be closed either on the client end or on the server end, and there appears to be no mechanism for that happening on the client end, since multiple threads would have to be working with the same socket for that to be a possibility. It is also true that in ManifoldCF we disable the automatic retries that are normally enabled for HttpComponents HttpClient. These automatic retries likely mask this problem should it be occurring in other situations. Places where there could potentially be a bug, in order of likelihood: (1) Jetty. Nobody I am aware of has seen this on Tomcat yet. But I also don't know if anyone has tried it. (2) Solr servlet. If it is possible for a servlet implementation to cause the connection to drop without any exception being generated, this would be something that should be researched. (3) HttpComponents/HttpClient. If there is a client-side issue, it would have to be because an httpclient instance was closing sockets from other instances. was: In ManifoldCF, we've been seeing problems with SolrJ connections throwing java.net.SocketException's. See CONNECTORS-616 for details as to exactly what varieties of this exception are thrown, but broken pipe is the most common. This occurs on multiple Unix variants as stated. (We also occasionally see exceptions on Windows, but they are much less frequent and are different variants than on Unix.) The exceptions seem to occur during the time an initial connection is getting established, and seems to occur randomly when multiple connections are getting established all at the same time. Wire logging shows that only the first few headers are sent before the connection is broken. Solr itself does log any error. A retry is usually sufficient to have the transaction succeed. The Solr Connector in ManifoldCF has recently been upgraded to rely on SolrJ, which could be a complicating factor. However, I have repeatedly audited both the Solr Connection code and the SolrJ code for best practices, and while I found a couple of problems, nothing seems to be of the sort that could cause a broken pipe. For that to happen, the socket must be closed either on the client end or on the server end, and there appears to be no mechanism for that happening on the client end, since multiple threads would have to be working with the same socket for that to be a possibility. It is also true that in ManifoldCF we disable the automatic retries that are normally enabled for HttpComponents HttpClient. These automatic retries likely mask this problem should it be occurring in other situations. Places where there could potentially be a bug, in order of likelihood: (1) Jetty. Nobody I am aware of has seen this on Tomcat yet. But I also don't know if anyone has tried it. (2) Solr servlet. If it is possible for a servlet implementation to cause the connection to drop without any exception being generated, this would be something that should be researched. (3) HttpComponents/HttpClient. If there is a client-side issue, it would have to be because an httpclient instance was closing sockets from other instances. Simultaneous multiple connections to Solr example often fail with various IOExceptions -- Key: SOLR-4328 URL: https://issues.apache.org/jira/browse/SOLR-4328 Project: Solr Issue Type: Bug Affects Versions: 4.0, 3.6.2 Environment: ManifoldCF, Solr connector, SolrJ, and Solr 4.0 or 3.6 on Mac OSX or Ubuntu,
[jira] [Commented] (SOLR-4043) Add ability to get success/failure responses from Collections API.
[ https://issues.apache.org/jira/browse/SOLR-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562780#comment-13562780 ] Commit Tag Bot commented on SOLR-4043: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1438555 SOLR-4043: Add ability to get success/failure responses from Collections API. Add ability to get success/failure responses from Collections API. -- Key: SOLR-4043 URL: https://issues.apache.org/jira/browse/SOLR-4043 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud cluster Reporter: Raintung Li Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: patch-4043.txt, SOLR-4043_brach4.x.txt, SOLR-4043.patch The create/delete/reload collections are asynchronous process, the client can't get the right response, only make sure the information have been saved into the OverseerCollectionQueue. The client will get the response directly that don't wait the result of behavior(create/delete/reload collection) whatever successful. The easy solution is client wait until the asynchronous process success, the create/delete/reload collection thread will save the response into OverseerCollectionQueue, then notify client to get the response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4352) Velocity-base pagination should support/preserve sorting
[ https://issues.apache.org/jira/browse/SOLR-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562787#comment-13562787 ] Eric Spiegelberg commented on SOLR-4352: After comparing the two patches, my patch is for a more narrow slice of functionality and does not account for the additional use cases that yours does. Yours is the way to go. Velocity-base pagination should support/preserve sorting Key: SOLR-4352 URL: https://issues.apache.org/jira/browse/SOLR-4352 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Eric Spiegelberg Assignee: Erik Hatcher Attachments: SOLR-4352-erik.patch, SOLR-4352.patch When performing /browse, the Velocity generated UI does not support sorting in the generated pagination links. The link_to_previous_page and link_to_next_page macros found within [apache-solr-4.0.0]/example/solr/collection1/conf/velocity/VM_global_library.vm should be modified to maintain/preserve an existing sort parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4719) Payloads per position broken
[ https://issues.apache.org/jira/browse/LUCENE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4719. - Resolution: Not A Problem Assignee: Robert Muir read the javadocs: its ok that it returns the same instance. the instance is not *yours* and will refer to different bytes, or bytes with different content (all at the discretion of the implementation) Payloads per position broken Key: LUCENE-4719 URL: https://issues.apache.org/jira/browse/LUCENE-4719 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.1 Reporter: André Assignee: Robert Muir Fix For: 4.1.1 In 4.0 it worked. Since 4.1 getPayload() returns the same ByteRef instance for every position of the same term. Additionally payloads stored on the term vector (correct) may differ form payloads stored in the postings (wrong). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent
[ https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562798#comment-13562798 ] Jan Høydahl commented on SOLR-3177: --- Thanks for the patch, Nikolai. You'll have a greater chance of having the patch committed if you bring it even closer to finalization. The patch should ideally be against TRUNK, alternatively against branch_4x. Also, please review this page http://wiki.apache.org/solr/HowToContribute#Generating_a_patch and try to follow the guidelines as closely as possible. Most important I think is that the patch only includes needed changes so that it is really easy to review what has changed. If you can write JUnit tests that's perfect, if not that's ok too :) Excluding tagged filter in StatsComponent - Key: SOLR-3177 URL: https://issues.apache.org/jira/browse/SOLR-3177 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1 Reporter: Mathias H. Priority: Minor Labels: localparams, stats, statscomponent Attachments: statsfilterexclude.patch It would be useful to exclude the effects of some fq params from the set of documents used to compute stats -- similar to how you can exclude tagged filters when generating facet counts... https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters So that it's possible to do something like this... http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 20]q=*:*stats=truestats.field={!ex=priceFilter}price If you want to create a price slider this is very useful because then you can filter the price ([1 TO 20) and nevertheless get the lower and upper bound of the unfiltered price (min=0, max=100): {noformat} |-[---]--| $0 $1 $20$100 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4359) The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break.
Mark Miller created SOLR-4359: - Summary: The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break. Key: SOLR-4359 URL: https://issues.apache.org/jira/browse/SOLR-4359 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.2, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4043) Add ability to get success/failure responses from Collections API.
[ https://issues.apache.org/jira/browse/SOLR-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4043. --- Resolution: Fixed Thanks Raintung! Let's open new JIRA's for anything further needed on this. Add ability to get success/failure responses from Collections API. -- Key: SOLR-4043 URL: https://issues.apache.org/jira/browse/SOLR-4043 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud cluster Reporter: Raintung Li Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: patch-4043.txt, SOLR-4043_brach4.x.txt, SOLR-4043.patch The create/delete/reload collections are asynchronous process, the client can't get the right response, only make sure the information have been saved into the OverseerCollectionQueue. The client will get the response directly that don't wait the result of behavior(create/delete/reload collection) whatever successful. The easy solution is client wait until the asynchronous process success, the create/delete/reload collection thread will save the response into OverseerCollectionQueue, then notify client to get the response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-4719) Payloads per position broken
[ https://issues.apache.org/jira/browse/LUCENE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] André reopened LUCENE-4719: Yes, you are right, but the value is also the same. Payloads per position broken Key: LUCENE-4719 URL: https://issues.apache.org/jira/browse/LUCENE-4719 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.1 Reporter: André Assignee: Robert Muir Fix For: 4.1.1 In 4.0 it worked. Since 4.1 getPayload() returns the same ByteRef instance for every position of the same term. Additionally payloads stored on the term vector (correct) may differ form payloads stored in the postings (wrong). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4719) Payloads per position broken
[ https://issues.apache.org/jira/browse/LUCENE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562801#comment-13562801 ] André edited comment on LUCENE-4719 at 1/25/13 4:04 PM: - Yes, you are right, but the value is also the same. You can easily test it. was (Author: antiheld): Yes, you are right, but the value is also the same. Payloads per position broken Key: LUCENE-4719 URL: https://issues.apache.org/jira/browse/LUCENE-4719 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.1 Reporter: André Assignee: Robert Muir Fix For: 4.1.1 In 4.0 it worked. Since 4.1 getPayload() returns the same ByteRef instance for every position of the same term. Additionally payloads stored on the term vector (correct) may differ form payloads stored in the postings (wrong). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4719) Payloads per position broken
[ https://issues.apache.org/jira/browse/LUCENE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] André updated LUCENE-4719: --- Description: In 4.0 it worked. Since 4.1 getPayload() returns the same value for every position of the same term. Additionally payloads stored on the term vector (correct) may differ form payloads stored in the postings (wrong). (was: In 4.0 it worked. Since 4.1 getPayload() returns the same ByteRef instance for every position of the same term. Additionally payloads stored on the term vector (correct) may differ form payloads stored in the postings (wrong).) Payloads per position broken Key: LUCENE-4719 URL: https://issues.apache.org/jira/browse/LUCENE-4719 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.1 Reporter: André Assignee: Robert Muir Fix For: 4.1.1 In 4.0 it worked. Since 4.1 getPayload() returns the same value for every position of the same term. Additionally payloads stored on the term vector (correct) may differ form payloads stored in the postings (wrong). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
java.lang.NumberFormatException Using PhraseQuery with Lucene 4.0.0
Hi, The below code is throwing the exception: java.lang.NumberFormatException: For input string: 01.SZ at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) when the TopDocs docs = indexSearcher.search(phraseQuery, null, 10, sort); line is called. This only happens when the searchPattern contains a space character. No other info is available in the exception. The 01.SZ value is the first value in my index... the index is a RAMDirectory... Anyone have any ideas? Many Thanks. Code: BooleanQuery.setMaxClauseCount(clauseCount); searchPattern = QueryParser.escape(searchPattern); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); IndexReader reader = IndexReader.open(index); IndexSearcher indexSearcher = new IndexSearcher(reader); PhraseQuery phraseQuery = new PhraseQuery(); Term term = new Term(fieldName, searchPattern); phraseQuery.add(term); phraseQuery.setSlop(0); Sort sort = new Sort(new SortField(fieldName,SortField.Type.SCORE)); TopDocs docs = indexSearcher.search(phraseQuery, null, 10, sort); -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-Using-PhraseQuery-with-Lucene-4-0-0-tp4036273.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fixing query-time multi-word synonym issue
One clarification from my previous comment: One requirement is to prevent false matches for instances of heart infarction and myocardial attack - the current synonym filter does not preserver the path or term ordering within the multi-term phrases. Even if the query parser does present the full term sequence as a single input string. Yes, the position information is preserved, but there is no path attribute to be able to tell that heart was before attack as opposed to before infarction. -- Jack Krupansky -Original Message- From: Robert Muir Sent: Friday, January 25, 2013 9:47 AM To: dev@lucene.apache.org Subject: Re: Fixing query-time multi-word synonym issue On Fri, Jan 25, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com wrote: Here's an example query with q.op=AND: causes of heart attack And I have this synonym definition: heart attack, myocardial infarction So, what is the alleged query parser fix so that the query is treated as: causes of (heart attack OR myocardial infarction) Thats actually inefficient and stupid to do. if you make a parser that doesnt split on whitespace, you can just tell it to fold at index and query time just like stemming. no OR necessary. But I think you are trying to get off topic, again the real problem affecting 99%+ users is that the lucene queryparser splits on whitespace. If this is fixed, then lots of things (not just synonyms, but other basic shit that is broken today) starts working too: https://issues.apache.org/jira/browse/LUCENE-2605 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jrockit-jdk1.6.0_33-R28.2.4-4.1.0) - Build # 3970 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/3970/ Java: 32bit/jrockit-jdk1.6.0_33-R28.2.4-4.1.0 -XnoOpt All tests passed Build Log: [...truncated 29961 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:305: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:120: The following files are missing svn:eol-style (or binary svn:mime-type): * solr/core/src/java/org/apache/solr/cloud/OverseerSolrResponse.java Total time: 56 minutes 6 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jrockit-jdk1.6.0_33-R28.2.4-4.1.0 -XnoOpt Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4360) TermVectorAccessor return terms that do not match with current document
Francois-Xavier Bonnet created SOLR-4360: Summary: TermVectorAccessor return terms that do not match with current document Key: SOLR-4360 URL: https://issues.apache.org/jira/browse/SOLR-4360 Project: Solr Issue Type: Bug Affects Versions: 3.6.2 Reporter: Francois-Xavier Bonnet For each term, TermVectorAccessor looks in the indexReader and calls termPositions.skipTo(documentNumber) but this methods returns the first document with id greater or equal to documentNumber. As a result you get some extra terms that do not really match with documentNumber. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4360) TermVectorAccessor return terms that do not match with current document
[ https://issues.apache.org/jira/browse/SOLR-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois-Xavier Bonnet updated SOLR-4360: - Attachment: SOLR-4360.txt Here is a patch TermVectorAccessor return terms that do not match with current document --- Key: SOLR-4360 URL: https://issues.apache.org/jira/browse/SOLR-4360 Project: Solr Issue Type: Bug Affects Versions: 3.6.2 Reporter: Francois-Xavier Bonnet Attachments: SOLR-4360.txt For each term, TermVectorAccessor looks in the indexReader and calls termPositions.skipTo(documentNumber) but this methods returns the first document with id greater or equal to documentNumber. As a result you get some extra terms that do not really match with documentNumber. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4325) DIH DateFormatEvaluator seems to have problems with DST changes - test disabled
[ https://issues.apache.org/jira/browse/SOLR-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562848#comment-13562848 ] Commit Tag Bot commented on SOLR-4325: -- [trunk commit] James Dyer http://svn.apache.org/viewvc?view=revisionrevision=1438597 SOLR-4325: fix TestBuiltInEvaluators DIH DateFormatEvaluator seems to have problems with DST changes - test disabled Key: SOLR-4325 URL: https://issues.apache.org/jira/browse/SOLR-4325 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0, 4.1 Reporter: Uwe Schindler Assignee: James Dyer Fix For: 4.2, 5.0 Attachments: SOLR-4325.patch Yesterday was DST change in Fidji (clock went one hour backwards, as summer time ended and winter time started). This caused org.apache.solr.handler.dataimport.TestBuiltInEvaluators.testDateFormatEvaluator to fail. The reason is simple: NOW-2DAYS is evaluated without taking time zone into account (its substracting 48 hours), but to be correct and go 2 DAYS back in local wall clock time, it must subtract only 47 hours. If this is not intended (we want to go 48 hours back, not 47), the test needs a fix. Otherwise the date evaluator must take the timezone into account when substracting days (e.g., use correctly localized Calendar instance and use the add() method ([http://docs.oracle.com/javase/6/docs/api/java/util/Calendar.html#add(int, int)]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4325) DIH DateFormatEvaluator seems to have problems with DST changes - test disabled
[ https://issues.apache.org/jira/browse/SOLR-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-4325. -- Resolution: Fixed DIH DateFormatEvaluator seems to have problems with DST changes - test disabled Key: SOLR-4325 URL: https://issues.apache.org/jira/browse/SOLR-4325 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0, 4.1 Reporter: Uwe Schindler Assignee: James Dyer Fix For: 4.2, 5.0 Attachments: SOLR-4325.patch Yesterday was DST change in Fidji (clock went one hour backwards, as summer time ended and winter time started). This caused org.apache.solr.handler.dataimport.TestBuiltInEvaluators.testDateFormatEvaluator to fail. The reason is simple: NOW-2DAYS is evaluated without taking time zone into account (its substracting 48 hours), but to be correct and go 2 DAYS back in local wall clock time, it must subtract only 47 hours. If this is not intended (we want to go 48 hours back, not 47), the test needs a fix. Otherwise the date evaluator must take the timezone into account when substracting days (e.g., use correctly localized Calendar instance and use the add() method ([http://docs.oracle.com/javase/6/docs/api/java/util/Calendar.html#add(int, int)]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_38) - Build # 3948 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/3948/ Java: 32bit/jdk1.6.0_38 -server -XX:+UseSerialGC All tests passed Build Log: [...truncated 29807 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:305: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:120: The following files are missing svn:eol-style (or binary svn:mime-type): * solr/core/src/java/org/apache/solr/cloud/OverseerSolrResponse.java Total time: 40 minutes 5 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.6.0_38 -server -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562853#comment-13562853 ] Renaud Delbru commented on LUCENE-4642: --- @steve: {quote} have you looked at TeeSinkTokenFilter {quote} Yes, and from my current understanding, it is similar to our current implementation. The problem with this approach is that the exchange of attributes is performed using the AttributeSource.State API with AttributeSource#captureState and AttributeSource#restoreState, which copies the values of all attribute implementations that the state contains, and this is very inefficient as it has to copies arrays and other objects (e.g., char term arrays, etc.) for every single token. @robert: Concerning the problem of UOEs, the new patch of Steve reduces the number of UOEs to one only, which is much more reasonable than my first approach. I have looked at the current state of the Lucene trunk, and there are already a lot of UOEs in many places. So, I would suggest that this problem may not be a blocking one (but I might be wrong). Concerning the problem of constructor explosion, maybe we can find a consensus. Your proposition of removing Tokenizer(AttributeSource) cannot work for us, as we need it to share a same AttributeSource across multiple streams. However, as I proposed, removing the Tokenizer(AttributeFactory) could work as it could be emulated by using Tokenizer(AttributeSource). TokenizerFactory should provide a create method with a given AttributeSource Key: LUCENE-4642 URL: https://issues.apache.org/jira/browse/LUCENE-4642 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.1 Reporter: Renaud Delbru Assignee: Steve Rowe Labels: analysis, attribute, tokenizer Fix For: 4.2, 5.0 Attachments: LUCENE-4642.patch, LUCENE-4642.patch All tokenizer implementations have a constructor that takes a given AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory does not provide an API to create tokenizers with a given AttributeSource. Side note: There are still a lot of tokenizers that do not provide constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4325) DIH DateFormatEvaluator seems to have problems with DST changes - test disabled
[ https://issues.apache.org/jira/browse/SOLR-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562854#comment-13562854 ] Commit Tag Bot commented on SOLR-4325: -- [branch_4x commit] James Dyer http://svn.apache.org/viewvc?view=revisionrevision=1438598 SOLR-4325: fix TestBuiltInEvaluators DIH DateFormatEvaluator seems to have problems with DST changes - test disabled Key: SOLR-4325 URL: https://issues.apache.org/jira/browse/SOLR-4325 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0, 4.1 Reporter: Uwe Schindler Assignee: James Dyer Fix For: 4.2, 5.0 Attachments: SOLR-4325.patch Yesterday was DST change in Fidji (clock went one hour backwards, as summer time ended and winter time started). This caused org.apache.solr.handler.dataimport.TestBuiltInEvaluators.testDateFormatEvaluator to fail. The reason is simple: NOW-2DAYS is evaluated without taking time zone into account (its substracting 48 hours), but to be correct and go 2 DAYS back in local wall clock time, it must subtract only 47 hours. If this is not intended (we want to go 48 hours back, not 47), the test needs a fix. Otherwise the date evaluator must take the timezone into account when substracting days (e.g., use correctly localized Calendar instance and use the add() method ([http://docs.oracle.com/javase/6/docs/api/java/util/Calendar.html#add(int, int)]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-4354) Replication should perform full copy if slave's generation higher than master's
[ https://issues.apache.org/jira/browse/SOLR-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Nithian closed SOLR-4354. -- Resolution: Invalid My apologies that was embarrassing. I was looking at the 4.0 code that we use and not the 4.1 code which has this fixed. I blindly copied my code to trunk without doing a proper code refresh (so much for late night working). Again please accept my apologies. Replication should perform full copy if slave's generation higher than master's --- Key: SOLR-4354 URL: https://issues.apache.org/jira/browse/SOLR-4354 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.1 Reporter: Amit Nithian Fix For: 4.2 Attachments: SOLR-4354.patch Original Estimate: 1h Remaining Estimate: 1h We have dual masters each incrementally indexing from our MySQL database and sit behind a virtual hostname in our load balancer. As such, it's possible that the generation numbers between the masters for a given index are not in sync. Slaves are configured to replicate from this virtual host (and pin based on source/dest IP hash) so we can add and remove masters as necessary (great for maintenance). For the most part this works but we've seen the following happen: * Slave has been pulling from master A * Master A goes down for maint and now will pull from master B (which has a lower generation number for some reason than master A). * Slave now tries to pull from master B (has higher index version than slave but lower generation). * Slave downloads index files, moves them to the index/ directory but these files are deleted during the doCommit() phase (looks like older generation data is deleted). * Index remains as-is and no change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-4354) Replication should perform full copy if slave's generation higher than master's
[ https://issues.apache.org/jira/browse/SOLR-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened SOLR-4354: --- Assignee: Mark Miller Replication should perform full copy if slave's generation higher than master's --- Key: SOLR-4354 URL: https://issues.apache.org/jira/browse/SOLR-4354 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.1 Reporter: Amit Nithian Assignee: Mark Miller Fix For: 4.2 Attachments: SOLR-4354.patch Original Estimate: 1h Remaining Estimate: 1h We have dual masters each incrementally indexing from our MySQL database and sit behind a virtual hostname in our load balancer. As such, it's possible that the generation numbers between the masters for a given index are not in sync. Slaves are configured to replicate from this virtual host (and pin based on source/dest IP hash) so we can add and remove masters as necessary (great for maintenance). For the most part this works but we've seen the following happen: * Slave has been pulling from master A * Master A goes down for maint and now will pull from master B (which has a lower generation number for some reason than master A). * Slave now tries to pull from master B (has higher index version than slave but lower generation). * Slave downloads index files, moves them to the index/ directory but these files are deleted during the doCommit() phase (looks like older generation data is deleted). * Index remains as-is and no change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4354) Replication should perform full copy if slave's generation higher than master's
[ https://issues.apache.org/jira/browse/SOLR-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4354. --- Resolution: Duplicate Replication should perform full copy if slave's generation higher than master's --- Key: SOLR-4354 URL: https://issues.apache.org/jira/browse/SOLR-4354 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.1 Reporter: Amit Nithian Assignee: Mark Miller Fix For: 4.2 Attachments: SOLR-4354.patch Original Estimate: 1h Remaining Estimate: 1h We have dual masters each incrementally indexing from our MySQL database and sit behind a virtual hostname in our load balancer. As such, it's possible that the generation numbers between the masters for a given index are not in sync. Slaves are configured to replicate from this virtual host (and pin based on source/dest IP hash) so we can add and remove masters as necessary (great for maintenance). For the most part this works but we've seen the following happen: * Slave has been pulling from master A * Master A goes down for maint and now will pull from master B (which has a lower generation number for some reason than master A). * Slave now tries to pull from master B (has higher index version than slave but lower generation). * Slave downloads index files, moves them to the index/ directory but these files are deleted during the doCommit() phase (looks like older generation data is deleted). * Index remains as-is and no change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4354) Replication should perform full copy if slave's generation higher than master's
[ https://issues.apache.org/jira/browse/SOLR-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562868#comment-13562868 ] Mark Miller commented on SOLR-4354: --- No worries Amit - looks like this was a dupe of SOLR-4303. Replication should perform full copy if slave's generation higher than master's --- Key: SOLR-4354 URL: https://issues.apache.org/jira/browse/SOLR-4354 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.1 Reporter: Amit Nithian Assignee: Mark Miller Fix For: 4.2 Attachments: SOLR-4354.patch Original Estimate: 1h Remaining Estimate: 1h We have dual masters each incrementally indexing from our MySQL database and sit behind a virtual hostname in our load balancer. As such, it's possible that the generation numbers between the masters for a given index are not in sync. Slaves are configured to replicate from this virtual host (and pin based on source/dest IP hash) so we can add and remove masters as necessary (great for maintenance). For the most part this works but we've seen the following happen: * Slave has been pulling from master A * Master A goes down for maint and now will pull from master B (which has a lower generation number for some reason than master A). * Slave now tries to pull from master B (has higher index version than slave but lower generation). * Slave downloads index files, moves them to the index/ directory but these files are deleted during the doCommit() phase (looks like older generation data is deleted). * Index remains as-is and no change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Moved] (LUCENE-4720) TermVectorAccessor return terms that do not match with current document
[ https://issues.apache.org/jira/browse/LUCENE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man moved SOLR-4360 to LUCENE-4720: Lucene Fields: New Affects Version/s: (was: 3.6.2) 3.6.2 Key: LUCENE-4720 (was: SOLR-4360) Project: Lucene - Core (was: Solr) TermVectorAccessor return terms that do not match with current document --- Key: LUCENE-4720 URL: https://issues.apache.org/jira/browse/LUCENE-4720 Project: Lucene - Core Issue Type: Bug Affects Versions: 3.6.2 Reporter: Francois-Xavier Bonnet Attachments: SOLR-4360.txt For each term, TermVectorAccessor looks in the indexReader and calls termPositions.skipTo(documentNumber) but this methods returns the first document with id greater or equal to documentNumber. As a result you get some extra terms that do not really match with documentNumber. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.6.0) - Build # 134 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/134/ Java: 64bit/jdk1.6.0 -XX:+UseSerialGC All tests passed Build Log: [...truncated 29905 lines...] BUILD FAILED /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/build.xml:305: The following error occurred while executing this line: /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:120: The following files are missing svn:eol-style (or binary svn:mime-type): * solr/core/src/java/org/apache/solr/cloud/OverseerSolrResponse.java Total time: 84 minutes 3 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.6.0 -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_38) - Build # 3971 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/3971/ Java: 32bit/jdk1.6.0_38 -client -XX:+UseParallelGC All tests passed Build Log: [...truncated 29914 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:305: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:120: The following files are missing svn:eol-style (or binary svn:mime-type): * solr/core/src/java/org/apache/solr/cloud/OverseerSolrResponse.java Total time: 36 minutes 32 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.6.0_38 -client -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #226: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/226/ 2 tests failed. FAILED: org.apache.solr.cloud.RecoveryZkTest.org.apache.solr.cloud.RecoveryZkTest Error Message: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=3494, name=coreLoadExecutor-1996-thread-1, state=RUNNABLE, group=TGRP-RecoveryZkTest], registration stack trace below. Stack Trace: com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=3494, name=coreLoadExecutor-1996-thread-1, state=RUNNABLE, group=TGRP-RecoveryZkTest], registration stack trace below. at java.lang.Thread.getStackTrace(Thread.java:1495) at com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:150) at org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:517) at org.apache.lucene.util.LuceneTestCase.wrapDirectory(LuceneTestCase.java:977) at org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:875) at org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:867) at org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33) at org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:267) at org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:223) at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:241) at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:446) at org.apache.solr.core.SolrCore.init(SolrCore.java:718) at org.apache.solr.core.SolrCore.init(SolrCore.java:607) at org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:949) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1031) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Caused by: java.lang.AssertionError: Directory not closed: BaseDirectoryWrapper(org.apache.lucene.store.RAMDirectory@302dd679 lockFactory=org.apache.lucene.store.NativeFSLockFactory@1d3aaf8a) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.util.CloseableDirectory.close(CloseableDirectory.java:47) at com.carrotsearch.randomizedtesting.RandomizedRunner$2$1.apply(RandomizedRunner.java:602) at com.carrotsearch.randomizedtesting.RandomizedRunner$2$1.apply(RandomizedRunner.java:599) at com.carrotsearch.randomizedtesting.RandomizedContext.closeResources(RandomizedContext.java:167) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.afterAlways(RandomizedRunner.java:615) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) ... 1 more FAILED: org.apache.solr.cloud.RecoveryZkTest.org.apache.solr.cloud.RecoveryZkTest Error Message: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=3524, name=RecoveryThread, state=RUNNABLE, group=TGRP-RecoveryZkTest], registration stack trace below. Stack Trace: com.carrotsearch.randomizedtesting.ResourceDisposalError: Resource in scope SUITE failed to close. Resource was registered from thread Thread[id=3524, name=RecoveryThread, state=RUNNABLE, group=TGRP-RecoveryZkTest], registration stack trace below. at java.lang.Thread.getStackTrace(Thread.java:1495) at com.carrotsearch.randomizedtesting.RandomizedContext.closeAtEnd(RandomizedContext.java:150) at org.apache.lucene.util.LuceneTestCase.closeAfterSuite(LuceneTestCase.java:517) at org.apache.lucene.util.LuceneTestCase.wrapDirectory(LuceneTestCase.java:983) at org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:875) at org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:867) at org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33) at org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:267) at
[jira] [Created] (SOLR-4361) DIH request parameters with dots throws UnsupportedOperationException
James Dyer created SOLR-4361: Summary: DIH request parameters with dots throws UnsupportedOperationException Key: SOLR-4361 URL: https://issues.apache.org/jira/browse/SOLR-4361 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.1 Reporter: James Dyer Assignee: James Dyer Priority: Minor Fix For: 4.2, 5.0 If the user puts placeholders for request parameters and these contain dots, DIH fails. Current workaround is to either use no dots or use the 4.0 DIH jar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4361) DIH request parameters with dots throws UnsupportedOperationException
[ https://issues.apache.org/jira/browse/SOLR-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562900#comment-13562900 ] James Dyer commented on SOLR-4361: -- Example from user list: I've just tried to upgrade from 4.0 to 4.1 and I have the following exception when reindexing my data: Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableMap.put(Collections.java:1283) at org.apache.solr.handler.dataimport.VariableResolver.currentLevelMap(VariableResolver.java:204) at org.apache.solr.handler.dataimport.VariableResolver.resolve(VariableResolver.java:94) at org.apache.solr.handler.dataimport.VariableResolver.replaceTokens(VariableResolver.java:144) at org.apache.solr.handler.dataimport.ContextImpl.replaceTokens(ContextImpl.java:254) at org.apache.solr.handler.dataimport.JdbcDataSource.resolveVariables(JdbcDataSource.java:203) at org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:101) at org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:62) at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:394) It seems to be related to the use of placeholders in data-config.xml: dataConfig dataSource type=JdbcDataSource name=bceDS driver=${dataimporter.request.solr.bceDS.driver} url=${dataimporter.request.solr.bceDS.url} user=${dataimporter.request.solr.bceDS.user} password=${dataimporter.request.solr.bceDS.password} batchSize=-1/ solrconfig.xml: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str !-- dataSource parameters for data-config.xml -- str name=solr.bceDS.driver.../str str name=solr.bceDS.url.../str str name=solr.bceDS.user.../str str name=solr.bceDS.password.../str /lst /requestHandler DIH request parameters with dots throws UnsupportedOperationException - Key: SOLR-4361 URL: https://issues.apache.org/jira/browse/SOLR-4361 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.1 Reporter: James Dyer Assignee: James Dyer Priority: Minor Fix For: 4.2, 5.0 If the user puts placeholders for request parameters and these contain dots, DIH fails. Current workaround is to either use no dots or use the 4.0 DIH jar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4225) Term info page under schema browser shows incorrect count of terms
[ https://issues.apache.org/jira/browse/SOLR-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-4225: Attachment: schema-browser_histogram.png SOLR-4225.patch Attached Screenshot shows how the new Histograms will look like, using Data from Shawn (the two on top) as well as exampedocs (the two at the bottom) Thoughts on this? Term info page under schema browser shows incorrect count of terms --- Key: SOLR-4225 URL: https://issues.apache.org/jira/browse/SOLR-4225 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: chrome (version: Version 22.0.1229.94 m) on a windows 2003 machine Reporter: Shreejay Assignee: Stefan Matheis (steffkes) Priority: Minor Attachments: luke-terms-elyograg.txt, schema-browser_histogram.png, schemabrowser-termcount-problem.png, SOLR-4225.patch, TermInfo.png The box sizes on the term info page (under Schema Browser), overlaps, due to which the number of terms shown look incorrect. Screenshot attached (TermInfo.png). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4225) Term info page under schema browser shows incorrect count of terms
[ https://issues.apache.org/jira/browse/SOLR-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562902#comment-13562902 ] Stefan Matheis (steffkes) edited comment on SOLR-4225 at 1/25/13 6:47 PM: -- Attached Screenshot (schema-browser_histogram.png) shows how the new Histograms will look like, using Data from Shawn (the two on top) as well as exampedocs (the two at the bottom) Thoughts on this? was (Author: steffkes): Attached Screenshot shows how the new Histograms will look like, using Data from Shawn (the two on top) as well as exampedocs (the two at the bottom) Thoughts on this? Term info page under schema browser shows incorrect count of terms --- Key: SOLR-4225 URL: https://issues.apache.org/jira/browse/SOLR-4225 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: chrome (version: Version 22.0.1229.94 m) on a windows 2003 machine Reporter: Shreejay Assignee: Stefan Matheis (steffkes) Priority: Minor Attachments: luke-terms-elyograg.txt, schema-browser_histogram.png, schemabrowser-termcount-problem.png, SOLR-4225.patch, TermInfo.png The box sizes on the term info page (under Schema Browser), overlaps, due to which the number of terms shown look incorrect. Screenshot attached (TermInfo.png). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4359) The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break.
[ https://issues.apache.org/jira/browse/SOLR-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562927#comment-13562927 ] Commit Tag Bot commented on SOLR-4359: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1438655 SOLR-4359: The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break. The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break. -- Key: SOLR-4359 URL: https://issues.apache.org/jira/browse/SOLR-4359 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.2, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4359) The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break.
[ https://issues.apache.org/jira/browse/SOLR-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562934#comment-13562934 ] Commit Tag Bot commented on SOLR-4359: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1438656 SOLR-4359: The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break. The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break. -- Key: SOLR-4359 URL: https://issues.apache.org/jira/browse/SOLR-4359 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.2, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4359) The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break.
[ https://issues.apache.org/jira/browse/SOLR-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4359. --- Resolution: Fixed The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break. -- Key: SOLR-4359 URL: https://issues.apache.org/jira/browse/SOLR-4359 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.2, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4361) DIH request parameters with dots throws UnsupportedOperationException
[ https://issues.apache.org/jira/browse/SOLR-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562947#comment-13562947 ] James Dyer commented on SOLR-4361: -- Also, this workaround was mentioned. This should be protected with a unit test so it doesn't get broken, also added to the wiki if not currently documented: I do something similar, but without the placeholders in db-data-config.xml. You can define the entire datasource in solrconfig.xml, then leave out that element entirely in db-data-config.xml. It seems really odd, but that is how the code works. This is working for me in 4.1, so it might be a workaround for you. It looks like this: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str lst name=datasource str name=defTypeJdbcDataSource/str str name=drivercom.mysql.jdbc.Driver/str str name=urljdbc:mysql://${textbooks.dbhost:nohost}//str str name=user${textbooks.dbuser:y}/str str name=password${textbooks.dbpass:zz}/str str name=batchSize-1/str str name=readOnlytrue/str str name=onErrorskip/str str name=netTimeoutForStreamingResults600/str str name=zeroDateTimeBehaviorconvertToNull/str /lst /lst /requestHandler DIH request parameters with dots throws UnsupportedOperationException - Key: SOLR-4361 URL: https://issues.apache.org/jira/browse/SOLR-4361 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.1 Reporter: James Dyer Assignee: James Dyer Priority: Minor Fix For: 4.2, 5.0 If the user puts placeholders for request parameters and these contain dots, DIH fails. Current workaround is to either use no dots or use the 4.0 DIH jar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4225) Term info page under schema browser shows incorrect count of terms
[ https://issues.apache.org/jira/browse/SOLR-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562954#comment-13562954 ] Hoss Man commented on SOLR-4225: +1 ... nice. Why are the numbers formated as {{8'388'608}} instead of {{8,388,608}} or the more SI recomended {{8 388 608}}? is {{\'}} a locale based convention i'm not aware of? Term info page under schema browser shows incorrect count of terms --- Key: SOLR-4225 URL: https://issues.apache.org/jira/browse/SOLR-4225 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: chrome (version: Version 22.0.1229.94 m) on a windows 2003 machine Reporter: Shreejay Assignee: Stefan Matheis (steffkes) Priority: Minor Attachments: luke-terms-elyograg.txt, schema-browser_histogram.png, schemabrowser-termcount-problem.png, SOLR-4225.patch, TermInfo.png The box sizes on the term info page (under Schema Browser), overlaps, due to which the number of terms shown look incorrect. Screenshot attached (TermInfo.png). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4225) Term info page under schema browser shows incorrect count of terms
[ https://issues.apache.org/jira/browse/SOLR-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562963#comment-13562963 ] Stefan Matheis (steffkes) commented on SOLR-4225: - bq. Why are the numbers formated as {{8'388'608}} instead of {{8,388,608}} or the more SI recomended {{8 388 608}}? is {{\'}} a locale based convention i'm not aware of? Uhm, that's a good question oO That's the same formatting rule i used for the DIH-Interface, grabbed a short Javascript-Snippet from StackOverflow which included this apostrophe. If we change the formatting-character, i'd like to use the , instead of the whitespace - because the whitespace only works well if you use mono-space formatting (as you did in your comment), otherwise the space between the digits is so small, that it does not really help while scanning the whole number. Term info page under schema browser shows incorrect count of terms --- Key: SOLR-4225 URL: https://issues.apache.org/jira/browse/SOLR-4225 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: chrome (version: Version 22.0.1229.94 m) on a windows 2003 machine Reporter: Shreejay Assignee: Stefan Matheis (steffkes) Priority: Minor Attachments: luke-terms-elyograg.txt, schema-browser_histogram.png, schemabrowser-termcount-problem.png, SOLR-4225.patch, TermInfo.png The box sizes on the term info page (under Schema Browser), overlaps, due to which the number of terms shown look incorrect. Screenshot attached (TermInfo.png). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4225) Term info page under schema browser shows incorrect count of terms
[ https://issues.apache.org/jira/browse/SOLR-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562963#comment-13562963 ] Stefan Matheis (steffkes) edited comment on SOLR-4225 at 1/25/13 8:04 PM: -- bq. Why are the numbers formated as {{8'388'608}} instead of {{8,388,608}} or the more SI recomended {{8 388 608}}? is {{\'}} a locale based convention i'm not aware of? Uhm, that's a good question oO That's the same formatting rule i used for the DIH-Interface, grabbed a short Javascript-Snippet from StackOverflow which included this apostrophe. If we change the formatting-character, i'd like to use the , instead of the whitespace - because the whitespace only works well if you use mono-space formatting (as you did in your comment), otherwise the space between the digits is so small, that it does not really help while scanning the whole number. # edit hmm, maybe .. looks, like it's the swiss formatting rule, but i didn't realize that :D was (Author: steffkes): bq. Why are the numbers formated as {{8'388'608}} instead of {{8,388,608}} or the more SI recomended {{8 388 608}}? is {{\'}} a locale based convention i'm not aware of? Uhm, that's a good question oO That's the same formatting rule i used for the DIH-Interface, grabbed a short Javascript-Snippet from StackOverflow which included this apostrophe. If we change the formatting-character, i'd like to use the , instead of the whitespace - because the whitespace only works well if you use mono-space formatting (as you did in your comment), otherwise the space between the digits is so small, that it does not really help while scanning the whole number. Term info page under schema browser shows incorrect count of terms --- Key: SOLR-4225 URL: https://issues.apache.org/jira/browse/SOLR-4225 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: chrome (version: Version 22.0.1229.94 m) on a windows 2003 machine Reporter: Shreejay Assignee: Stefan Matheis (steffkes) Priority: Minor Attachments: luke-terms-elyograg.txt, schema-browser_histogram.png, schemabrowser-termcount-problem.png, SOLR-4225.patch, TermInfo.png The box sizes on the term info page (under Schema Browser), overlaps, due to which the number of terms shown look incorrect. Screenshot attached (TermInfo.png). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4225) Term info page under schema browser shows incorrect count of terms
[ https://issues.apache.org/jira/browse/SOLR-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-4225: Attachment: schema-browser_histogram.png Instead of maybe .. have a look at your own .. the two samples on top are updated .. the first using whitespace and the second using comma as separator - let me know which one :) Term info page under schema browser shows incorrect count of terms --- Key: SOLR-4225 URL: https://issues.apache.org/jira/browse/SOLR-4225 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: chrome (version: Version 22.0.1229.94 m) on a windows 2003 machine Reporter: Shreejay Assignee: Stefan Matheis (steffkes) Priority: Minor Attachments: luke-terms-elyograg.txt, schema-browser_histogram.png, schema-browser_histogram.png, schemabrowser-termcount-problem.png, SOLR-4225.patch, TermInfo.png The box sizes on the term info page (under Schema Browser), overlaps, due to which the number of terms shown look incorrect. Screenshot attached (TermInfo.png). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4325) DIH DateFormatEvaluator seems to have problems with DST changes - test disabled
[ https://issues.apache.org/jira/browse/SOLR-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563012#comment-13563012 ] Uwe Schindler commented on SOLR-4325: - Thanks James! DIH DateFormatEvaluator seems to have problems with DST changes - test disabled Key: SOLR-4325 URL: https://issues.apache.org/jira/browse/SOLR-4325 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0, 4.1 Reporter: Uwe Schindler Assignee: James Dyer Fix For: 4.2, 5.0 Attachments: SOLR-4325.patch Yesterday was DST change in Fidji (clock went one hour backwards, as summer time ended and winter time started). This caused org.apache.solr.handler.dataimport.TestBuiltInEvaluators.testDateFormatEvaluator to fail. The reason is simple: NOW-2DAYS is evaluated without taking time zone into account (its substracting 48 hours), but to be correct and go 2 DAYS back in local wall clock time, it must subtract only 47 hours. If this is not intended (we want to go 48 hours back, not 47), the test needs a fix. Otherwise the date evaluator must take the timezone into account when substracting days (e.g., use correctly localized Calendar instance and use the add() method ([http://docs.oracle.com/javase/6/docs/api/java/util/Calendar.html#add(int, int)]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4695) Add utility class for getting live values for a given field during NRT indexing
[ https://issues.apache.org/jira/browse/LUCENE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563061#comment-13563061 ] Commit Tag Bot commented on LUCENE-4695: [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revisionrevision=1438721 LUCENE-4695: add LiveFieldValues, to get current (live/real-time) values for fields indexed after the last NRT reopen Add utility class for getting live values for a given field during NRT indexing --- Key: LUCENE-4695 URL: https://issues.apache.org/jira/browse/LUCENE-4695 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4695.patch, LUCENE-4695.patch This is a simple utility/wrapper class, that holds the field values for recently indexed documents until the NRT reader has refreshed, and exposes a get API to get the last indexed value per id. For example one could use this to look up the version field for a given id, even when that id was just indexed and not yet visible in the NRT reader. The implementation is fairly simple: it just watches the gen coming out of NRTManager and updates/prunes accordingly. The class is abstract: you must subclass it and impl the lookupFromSearcher method... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4362) edismax, phrase query with slop, pf parameter
Ahmet Arslan created SOLR-4362: -- Summary: edismax, phrase query with slop, pf parameter Key: SOLR-4362 URL: https://issues.apache.org/jira/browse/SOLR-4362 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.1 Reporter: Ahmet Arslan When sloppy phrase query (plus additional term) is used with edismax, slop value is search against fields that are supplied with pf parameter. Example : With this url q=phrase query~10 termqf=textpf=text document having 10 term in its text field is boosted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4695) Add utility class for getting live values for a given field during NRT indexing
[ https://issues.apache.org/jira/browse/LUCENE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-4695. Resolution: Fixed Add utility class for getting live values for a given field during NRT indexing --- Key: LUCENE-4695 URL: https://issues.apache.org/jira/browse/LUCENE-4695 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4695.patch, LUCENE-4695.patch This is a simple utility/wrapper class, that holds the field values for recently indexed documents until the NRT reader has refreshed, and exposes a get API to get the last indexed value per id. For example one could use this to look up the version field for a given id, even when that id was just indexed and not yet visible in the NRT reader. The implementation is fairly simple: it just watches the gen coming out of NRTManager and updates/prunes accordingly. The class is abstract: you must subclass it and impl the lookupFromSearcher method... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fixing query-time multi-word synonym issue
PositionLengthAttribute is sufficient to express the true graph, but SynonymFilter has not been fully fixed to properly set it. Specifically, it cannot create new positions, which is what's necessary if you expand when applying synonyms (e.g., dns - domain name service). It is better to do the reverse: map the multi-word phrase down to a single token, at indexing time (domain name service - dns): you get accurate scoring (exact docFreq for how many docs have either dns or domain name service) and faster search performance, and PosLenAtt is properly set, and you workaround the fact that the index cannot index the position length att (since you never create alternate paths in the token graph). The downside is you must re-index if you change your synonyms. However: once we fix QueryParser to stop splitting on whitespace (it's really ridiculous that it does so: it causes so many problems), and fix SynFilter to create positions, it is in theory possible to take the resulting graph (if you expand when applying synonyms) and enumerate the correct query (something like MultiPhraseQuery, or OR of them, or something; maybe we'll need WordGraphQuery), and get the correct results. Mike McCandless http://blog.mikemccandless.com On Fri, Jan 25, 2013 at 11:27 AM, Jack Krupansky j...@basetechnology.com wrote: One clarification from my previous comment: One requirement is to prevent false matches for instances of heart infarction and myocardial attack - the current synonym filter does not preserver the path or term ordering within the multi-term phrases. Even if the query parser does present the full term sequence as a single input string. Yes, the position information is preserved, but there is no path attribute to be able to tell that heart was before attack as opposed to before infarction. -- Jack Krupansky -Original Message- From: Robert Muir Sent: Friday, January 25, 2013 9:47 AM To: dev@lucene.apache.org Subject: Re: Fixing query-time multi-word synonym issue On Fri, Jan 25, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com wrote: Here's an example query with q.op=AND: causes of heart attack And I have this synonym definition: heart attack, myocardial infarction So, what is the alleged query parser fix so that the query is treated as: causes of (heart attack OR myocardial infarction) Thats actually inefficient and stupid to do. if you make a parser that doesnt split on whitespace, you can just tell it to fold at index and query time just like stemming. no OR necessary. But I think you are trying to get off topic, again the real problem affecting 99%+ users is that the lucene queryparser splits on whitespace. If this is fixed, then lots of things (not just synonyms, but other basic shit that is broken today) starts working too: https://issues.apache.org/jira/browse/LUCENE-2605 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4695) Add utility class for getting live values for a given field during NRT indexing
[ https://issues.apache.org/jira/browse/LUCENE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563071#comment-13563071 ] Commit Tag Bot commented on LUCENE-4695: [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revisionrevision=1438731 LUCENE-4695: add LiveFieldValues, to get current (live/real-time) values for fields indexed after the last NRT reopen Add utility class for getting live values for a given field during NRT indexing --- Key: LUCENE-4695 URL: https://issues.apache.org/jira/browse/LUCENE-4695 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: LUCENE-4695.patch, LUCENE-4695.patch This is a simple utility/wrapper class, that holds the field values for recently indexed documents until the NRT reader has refreshed, and exposes a get API to get the last indexed value per id. For example one could use this to look up the version field for a given id, even when that id was just indexed and not yet visible in the NRT reader. The implementation is fairly simple: it just watches the gen coming out of NRTManager and updates/prunes accordingly. The class is abstract: you must subclass it and impl the lookupFromSearcher method... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: java.lang.NumberFormatException Using PhraseQuery with Lucene 4.0.0
Can you provide the full stack trace? Mike McCandless http://blog.mikemccandless.com On Fri, Jan 25, 2013 at 11:13 AM, JimAld jim.alder...@db.com wrote: Hi, The below code is throwing the exception: java.lang.NumberFormatException: For input string: 01.SZ at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) when the TopDocs docs = indexSearcher.search(phraseQuery, null, 10, sort); line is called. This only happens when the searchPattern contains a space character. No other info is available in the exception. The 01.SZ value is the first value in my index... the index is a RAMDirectory... Anyone have any ideas? Many Thanks. Code: BooleanQuery.setMaxClauseCount(clauseCount); searchPattern = QueryParser.escape(searchPattern); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); IndexReader reader = IndexReader.open(index); IndexSearcher indexSearcher = new IndexSearcher(reader); PhraseQuery phraseQuery = new PhraseQuery(); Term term = new Term(fieldName, searchPattern); phraseQuery.add(term); phraseQuery.setSlop(0); Sort sort = new Sort(new SortField(fieldName,SortField.Type.SCORE)); TopDocs docs = indexSearcher.search(phraseQuery, null, 10, sort); -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-Using-PhraseQuery-with-Lucene-4-0-0-tp4036273.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4362) edismax, phrase query with slop, pf parameter
[ https://issues.apache.org/jira/browse/SOLR-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563075#comment-13563075 ] Ahmet Arslan commented on SOLR-4362: http://search-lucene.com/m/RwfwXkbfc edismax, phrase query with slop, pf parameter - Key: SOLR-4362 URL: https://issues.apache.org/jira/browse/SOLR-4362 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.1 Reporter: Ahmet Arslan Labels: edismax, pf When sloppy phrase query (plus additional term) is used with edismax, slop value is search against fields that are supplied with pf parameter. Example : With this url q=phrase query~10 termqf=textpf=text document having 10 term in its text field is boosted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4362) edismax, phrase query with slop, pf parameter
[ https://issues.apache.org/jira/browse/SOLR-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-4362: --- Attachment: SOLR-4362.patch A failing test case that demonstrates the problem. edismax, phrase query with slop, pf parameter - Key: SOLR-4362 URL: https://issues.apache.org/jira/browse/SOLR-4362 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.1 Reporter: Ahmet Arslan Labels: edismax, pf Attachments: SOLR-4362.patch When sloppy phrase query (plus additional term) is used with edismax, slop value is search against fields that are supplied with pf parameter. Example : With this url q=phrase query~10 termqf=textpf=text document having 10 term in its text field is boosted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4721) WordDelimiterFilter ignores payloads
Scott Smerchek created LUCENE-4721: -- Summary: WordDelimiterFilter ignores payloads Key: LUCENE-4721 URL: https://issues.apache.org/jira/browse/LUCENE-4721 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.1 Reporter: Scott Smerchek Attachments: LUCENE-4721.patch When generating new tokens, the WordDelimeterFilter does not carry forward payloads. It appears that this issue was fixed long ago in 1.4 (https://issues.apache.org/jira/browse/SOLR-532), however, it is yet again an issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4721) WordDelimiterFilter ignores payloads
[ https://issues.apache.org/jira/browse/LUCENE-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Smerchek updated LUCENE-4721: --- Attachment: LUCENE-4721.patch WordDelimiterFilter ignores payloads - Key: LUCENE-4721 URL: https://issues.apache.org/jira/browse/LUCENE-4721 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.1 Reporter: Scott Smerchek Attachments: LUCENE-4721.patch When generating new tokens, the WordDelimeterFilter does not carry forward payloads. It appears that this issue was fixed long ago in 1.4 (https://issues.apache.org/jira/browse/SOLR-532), however, it is yet again an issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4722) Can we move SortField.Type.SCORE/DOC to singleton SortField instances instead...?
Michael McCandless created LUCENE-4722: -- Summary: Can we move SortField.Type.SCORE/DOC to singleton SortField instances instead...? Key: LUCENE-4722 URL: https://issues.apache.org/jira/browse/LUCENE-4722 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 4.2, 5.0 It's ... weird that you can do eg new SortField(myfield, SortField.Type.SCORE). We already have dedicated SortField.FIELD_SCORE and FIELD_DOC ... so I think apps should use those and never make a new SortField for them? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4642) TokenizerFactory should provide a create method with a given AttributeSource
[ https://issues.apache.org/jira/browse/LUCENE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563108#comment-13563108 ] Robert Muir commented on LUCENE-4642: - My problem i guess with AttributeSource/AttributeFactory is that they invade on every single custom tokenizer: the API is not good. I realize its useful for expert users to be able to plug in their own, but why in the world must *every* tokenizer have ctor explosion (minimum 3) to support this? And I guess I was secretly hoping we could remove Tokenizer(AttributeSource) if we fixed the solr hack. :) Again my main problem is not about what you want to do, its instead related to the existing APIs (Tokenizer.java) and where we are heading if we perpetuate this to the analysis factories (TokenizerFactory) too. TokenizerFactory should provide a create method with a given AttributeSource Key: LUCENE-4642 URL: https://issues.apache.org/jira/browse/LUCENE-4642 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.1 Reporter: Renaud Delbru Assignee: Steve Rowe Labels: analysis, attribute, tokenizer Fix For: 4.2, 5.0 Attachments: LUCENE-4642.patch, LUCENE-4642.patch All tokenizer implementations have a constructor that takes a given AttributeSource as parameter (LUCENE-1826). However, the TokenizerFactory does not provide an API to create tokenizers with a given AttributeSource. Side note: There are still a lot of tokenizers that do not provide constructors that take AttributeSource and AttributeFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4225) Term info page under schema browser shows incorrect count of terms
[ https://issues.apache.org/jira/browse/SOLR-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563129#comment-13563129 ] Hoss Man commented on SOLR-4225: Well, 16 years of education at US schools has biased me in favor of using comma as the thousand separator -- but i appreciate that people smarter then me clam whitespace separation is less confusing when communicating with people from other cultures that have diff conventions. (although i appreciate your point about it mainly being useful in fixed width fonts) truthfully i don't really care what we use: i was just surprised by the apostrophes since that's not a convention i'd ever seen before in any locale. Term info page under schema browser shows incorrect count of terms --- Key: SOLR-4225 URL: https://issues.apache.org/jira/browse/SOLR-4225 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: chrome (version: Version 22.0.1229.94 m) on a windows 2003 machine Reporter: Shreejay Assignee: Stefan Matheis (steffkes) Priority: Minor Attachments: luke-terms-elyograg.txt, schema-browser_histogram.png, schema-browser_histogram.png, schemabrowser-termcount-problem.png, SOLR-4225.patch, TermInfo.png The box sizes on the term info page (under Schema Browser), overlaps, due to which the number of terms shown look incorrect. Screenshot attached (TermInfo.png). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: java.lang.NumberFormatException Using PhraseQuery with Lucene 4.0.0
Sure, here it is: java.lang.NumberFormatException: For input string: 01.SZ at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:458) at java.lang.Byte.parseByte(Byte.java:151) at java.lang.Byte.parseByte(Byte.java:108) at org.apache.lucene.search.FieldCache$1.parseByte(FieldCache.java:130) at org.apache.lucene.search.FieldCacheImpl$ByteCache.createValue(FieldCacheImpl.java:366) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:248) at org.apache.lucene.search.FieldCacheImpl.getBytes(FieldCacheImpl.java:329) at org.apache.lucene.search.FieldComparator$ByteComparator.setNextReader(FieldComparator.java:271) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:585) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:555) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:484) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309) at com.db.gef.locates.index.impl.LuceneLocatesSearchIndex.getMatchingIndexedObjectPhrases(LuceneLocatesSearchIndex.java:361) at com.db.gef.locates.cache.impl.services.CasheServiceImpl.lookupSecurity(CasheServiceImpl.java:304) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.remoting.support.RemoteInvocationTraceInterceptor.invoke(RemoteInvocationTraceInterceptor.java:70) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy62.lookupSecurity(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:157) at org.springframework.remoting.caucho.Hessian2SkeletonInvoker.invoke(Hessian2SkeletonInvoker.java:67) at org.springframework.remoting.caucho.HessianServiceExporter.handleRequest(HessianServiceExporter.java:147) at org.springframework.web.servlet.mvc.HttpRequestHandlerAdapter.handle(HttpRequestHandlerAdapter.java:49) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:859) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:793) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:476) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:441) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:175) at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3498) at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) at weblogic.security.service.SecurityManager.runAs(Unknown Source) at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) at
Re: java.lang.NumberFormatException Using PhraseQuery with Lucene 4.0.0
Also, I made a mistake in my original post, the sort constructor used is actually of type String as follows: Sort sort = new Sort(new SortField(fieldName,SortField.Type.STRING)); All the rest of the code is correct. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-Using-PhraseQuery-with-Lucene-4-0-0-tp4036273p4036383.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4362) edismax, phrase query with slop, pf parameter
[ https://issues.apache.org/jira/browse/SOLR-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563152#comment-13563152 ] Ahmet Arslan commented on SOLR-4362: org.apache.solr.search.ExtendedDismaxQParser#splitIntoClauses(\phrase query\~10 term) return 3 Clauses: {noformat} field = null rawField = null isPhrase = true val = phrase query raw = phrase query field = null rawField = null isPhrase = false val = \~10 raw = ~10 field = null rawField = null isPhrase = false val = term raw = term {noformat} And mainUserQuery becomes : phrase query ~10 term edismax, phrase query with slop, pf parameter - Key: SOLR-4362 URL: https://issues.apache.org/jira/browse/SOLR-4362 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.1 Reporter: Ahmet Arslan Labels: edismax, pf Attachments: SOLR-4362.patch When sloppy phrase query (plus additional term) is used with edismax, slop value is search against fields that are supplied with pf parameter. Example : With this url q=phrase query~10 termqf=textpf=text document having 10 term in its text field is boosted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fixing query-time multi-word synonym issue
Thanks for that insight, Mike! I don't think anybody is in disagreement with the need for the query parser to present the full, white-space delimited pseudo-term sequence to analysis in one step. My proposal from last September recognized that. Is there a decent writeup on PositionLengthAttribute? I mean, the Javadoc says The positionLength determines how many positions this token spans, which doesn't sound very relevant to multi-term synonyms that span multiple positions. -- Jack Krupansky -Original Message- From: Michael McCandless Sent: Friday, January 25, 2013 4:41 PM To: dev@lucene.apache.org Subject: Re: Fixing query-time multi-word synonym issue PositionLengthAttribute is sufficient to express the true graph, but SynonymFilter has not been fully fixed to properly set it. Specifically, it cannot create new positions, which is what's necessary if you expand when applying synonyms (e.g., dns - domain name service). It is better to do the reverse: map the multi-word phrase down to a single token, at indexing time (domain name service - dns): you get accurate scoring (exact docFreq for how many docs have either dns or domain name service) and faster search performance, and PosLenAtt is properly set, and you workaround the fact that the index cannot index the position length att (since you never create alternate paths in the token graph). The downside is you must re-index if you change your synonyms. However: once we fix QueryParser to stop splitting on whitespace (it's really ridiculous that it does so: it causes so many problems), and fix SynFilter to create positions, it is in theory possible to take the resulting graph (if you expand when applying synonyms) and enumerate the correct query (something like MultiPhraseQuery, or OR of them, or something; maybe we'll need WordGraphQuery), and get the correct results. Mike McCandless http://blog.mikemccandless.com On Fri, Jan 25, 2013 at 11:27 AM, Jack Krupansky j...@basetechnology.com wrote: One clarification from my previous comment: One requirement is to prevent false matches for instances of heart infarction and myocardial attack - the current synonym filter does not preserver the path or term ordering within the multi-term phrases. Even if the query parser does present the full term sequence as a single input string. Yes, the position information is preserved, but there is no path attribute to be able to tell that heart was before attack as opposed to before infarction. -- Jack Krupansky -Original Message- From: Robert Muir Sent: Friday, January 25, 2013 9:47 AM To: dev@lucene.apache.org Subject: Re: Fixing query-time multi-word synonym issue On Fri, Jan 25, 2013 at 9:19 AM, Jack Krupansky j...@basetechnology.com wrote: Here's an example query with q.op=AND: causes of heart attack And I have this synonym definition: heart attack, myocardial infarction So, what is the alleged query parser fix so that the query is treated as: causes of (heart attack OR myocardial infarction) Thats actually inefficient and stupid to do. if you make a parser that doesnt split on whitespace, you can just tell it to fold at index and query time just like stemming. no OR necessary. But I think you are trying to get off topic, again the real problem affecting 99%+ users is that the lucene queryparser splits on whitespace. If this is fixed, then lots of things (not just synonyms, but other basic shit that is broken today) starts working too: https://issues.apache.org/jira/browse/LUCENE-2605 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive
[ https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563305#comment-13563305 ] Commit Tag Bot commented on LUCENE-1822: [trunk commit] Koji Sekiguchi http://svn.apache.org/viewvc?view=revisionrevision=1438822 LUCENE-1822: add a note in Changes in runtime behavior FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive -- Key: LUCENE-1822 URL: https://issues.apache.org/jira/browse/LUCENE-1822 Project: Lucene - Core Issue Type: Improvement Components: modules/highlighter Affects Versions: 2.9 Environment: any Reporter: Alex Vigdor Assignee: Koji Sekiguchi Priority: Minor Fix For: 4.1, 5.0 Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822-tests.patch The new FastVectorHighlighter performs extremely well, however I've found in testing that the window of text chosen per fragment is often very poor, as it is hard coded in SimpleFragListBuilder to always select starting 6 characters to the left of the first phrase match in a fragment. When selecting long fragments, this often means that there is barely any context before the highlighted word, and lots after; even worse, when highlighting a phrase at the end of a short text the beginning is cut off, even though the entire phrase would fit in the specified fragCharSize. For example, highlighting Punishment in Crime and Punishment returns e and bPunishment/b no matter what fragCharSize is specified. I am going to attach a patch that improves the text window selection by recalculating the starting margin once all phrases in the fragment have been identified - this way if a single word is matched in a fragment, it will appear in the middle of the highlight, instead of 6 characters from the beginning. This way one can also guarantee that the entirety of short texts are represented in a fragment by specifying a large enough fragCharSize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive
[ https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563308#comment-13563308 ] Koji Sekiguchi commented on LUCENE-1822: I committed the above note to trunk, branch_4x and lucene_solr_4_1. FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive -- Key: LUCENE-1822 URL: https://issues.apache.org/jira/browse/LUCENE-1822 Project: Lucene - Core Issue Type: Improvement Components: modules/highlighter Affects Versions: 2.9 Environment: any Reporter: Alex Vigdor Assignee: Koji Sekiguchi Priority: Minor Fix For: 4.1, 5.0 Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822-tests.patch The new FastVectorHighlighter performs extremely well, however I've found in testing that the window of text chosen per fragment is often very poor, as it is hard coded in SimpleFragListBuilder to always select starting 6 characters to the left of the first phrase match in a fragment. When selecting long fragments, this often means that there is barely any context before the highlighted word, and lots after; even worse, when highlighting a phrase at the end of a short text the beginning is cut off, even though the entire phrase would fit in the specified fragCharSize. For example, highlighting Punishment in Crime and Punishment returns e and bPunishment/b no matter what fragCharSize is specified. I am going to attach a patch that improves the text window selection by recalculating the starting margin once all phrases in the fragment have been identified - this way if a single word is matched in a fragment, it will appear in the middle of the highlight, instead of 6 characters from the beginning. This way one can also guarantee that the entirety of short texts are represented in a fragment by specifying a large enough fragCharSize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive
[ https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563312#comment-13563312 ] Commit Tag Bot commented on LUCENE-1822: [branch_4x commit] Koji Sekiguchi http://svn.apache.org/viewvc?view=revisionrevision=1438824 LUCENE-1822: add a note in Changes in runtime behavior FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive -- Key: LUCENE-1822 URL: https://issues.apache.org/jira/browse/LUCENE-1822 Project: Lucene - Core Issue Type: Improvement Components: modules/highlighter Affects Versions: 2.9 Environment: any Reporter: Alex Vigdor Assignee: Koji Sekiguchi Priority: Minor Fix For: 4.1, 5.0 Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822-tests.patch The new FastVectorHighlighter performs extremely well, however I've found in testing that the window of text chosen per fragment is often very poor, as it is hard coded in SimpleFragListBuilder to always select starting 6 characters to the left of the first phrase match in a fragment. When selecting long fragments, this often means that there is barely any context before the highlighted word, and lots after; even worse, when highlighting a phrase at the end of a short text the beginning is cut off, even though the entire phrase would fit in the specified fragCharSize. For example, highlighting Punishment in Crime and Punishment returns e and bPunishment/b no matter what fragCharSize is specified. I am going to attach a patch that improves the text window selection by recalculating the starting margin once all phrases in the fragment have been identified - this way if a single word is matched in a fragment, it will appear in the middle of the highlight, instead of 6 characters from the beginning. This way one can also guarantee that the entirety of short texts are represented in a fragment by specifying a large enough fragCharSize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: java.lang.NumberFormatException Using PhraseQuery with Lucene 4.0.0
Hi, this has nothing to do with PhraseQuery. The stack trace shows, that your code seems to have passed SortField.BYTE, so maybe you have some logic error? PhraseQuery by itself does not use the FieldCache, only the the result collector is using the cache and this one is independent from the query. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: JimAld [mailto:jim.alder...@db.com] Sent: Saturday, January 26, 2013 12:01 AM To: dev@lucene.apache.org Subject: Re: java.lang.NumberFormatException Using PhraseQuery with Lucene 4.0.0 Sure, here it is: java.lang.NumberFormatException: For input string: 01.SZ at java.lang.NumberFormatException.forInputString(NumberFormatException. java:48) at java.lang.Integer.parseInt(Integer.java:458) at java.lang.Byte.parseByte(Byte.java:151) at java.lang.Byte.parseByte(Byte.java:108) at org.apache.lucene.search.FieldCache$1.parseByte(FieldCache.java:130) at org.apache.lucene.search.FieldCacheImpl$ByteCache.createValue(FieldCach eImpl.java:366) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:24 8) at org.apache.lucene.search.FieldCacheImpl.getBytes(FieldCacheImpl.java:329) at org.apache.lucene.search.FieldComparator$ByteComparator.setNextReader (FieldComparator.java:271) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringColl ector.setNextReader(TopFieldCollector.java:97) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:585) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:555) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:484) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309) at com.db.gef.locates.index.impl.LuceneLocatesSearchIndex.getMatchingInde xedObjectPhrases(LuceneLocatesSearchIndex.java:361) at com.db.gef.locates.cache.impl.services.CasheServiceImpl.lookupSecurity(Ca sheServiceImpl.java:304) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection( AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoi npoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed( ReflectiveMethodInvocation.java:149) at org.springframework.remoting.support.RemoteInvocationTraceInterceptor.i nvoke(RemoteInvocationTraceInterceptor.java:70) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed( ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDyna micAopProxy.java:204) at $Proxy62.lookupSecurity(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:15 7) at org.springframework.remoting.caucho.Hessian2SkeletonInvoker.invoke(Hes sian2SkeletonInvoker.java:67) at org.springframework.remoting.caucho.HessianServiceExporter.handleReque st(HessianServiceExporter.java:147) at org.springframework.web.servlet.mvc.HttpRequestHandlerAdapter.handle( HttpRequestHandlerAdapter.java:49) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherS ervlet.java:859) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherSe rvlet.java:793) at org.springframework.web.servlet.FrameworkServlet.processRequest(Frame workServlet.java:476) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkSer vlet.java:441) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(Stub SecurityHelper.java:227) at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHel per.java:125) at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) at