[jira] [Commented] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization
[ https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508553#comment-13508553 ] Uwe Schindler commented on SOLR-4123: - Can you please remove the workaround inside the factory (prepending "/")? This will break non-tests (e.g. when those classes are loaded from file system). Just *only* load classes from the local package of the class that was passed into the resourceloader. bq. Yes, we should remove this /-stuff!!! I misunderstood that. I thought you wanted to fix something else. YES, PLEASE REMOVE, it may break non-tests! > ICUTokenizerFactory - per-script RBBI customization > --- > > Key: SOLR-4123 > URL: https://issues.apache.org/jira/browse/SOLR-4123 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.0 >Reporter: Shawn Heisey > Fix For: 4.1, 5.0 > > Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch > > > Initially this started out as an idea for a configuration knob on > ICUTokenizer that would allow me to tell it not to tokenize on punctuation. > Through IRC discussion on #lucene, it sorta ballooned. The committers had a > long discussion about it that I don't really understand, so I'll be including > it in the comments. > I am a Solr user, so I would also need the ability to access the > configuration from there, likely either in schema.xml or solrconfig.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure
Good to see those suite timeouts are working :) Dawid On Sun, Dec 2, 2012 at 11:50 PM, Michael McCandless wrote: > You're welcome! > > I committed a fix to cut back on the index size when MockRandomMP is > used, because this MP is O(N^2) cost! > > Mike McCandless > > http://blog.mikemccandless.com > > On Sun, Dec 2, 2012 at 5:00 PM, Uwe Schindler wrote: >> Thanks! >> >> Uwe >> >> >> >> Michael McCandless schrieb: >>> >>> I'll dig ... >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Sun, Dec 2, 2012 at 3:10 PM, Apache Jenkins Server >>> wrote: Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/ 2 tests failed. REGRESSION: org.apache.lucene.index.TestBagOfPositions.test Error Message: Test abandoned because suite timeout was reached. Stack Trace: java.lang.Exception: Test abandoned because suite timeout was reached. at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) FAILED: junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions Error Message: Suite timeout exceeded (>= 720 msec). Stack Trace: java.lang.Exception: Suite timeout exceeded (>= 720 msec). at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) Build Log: [...truncated 1360 lines...] [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions [junit4:junit4] 2> 2012-12-3 2:05:06 com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate [junit4:junit4] 2> WARNING: Suite execution timed out: org.apache.lucene.index.TestBagOfPositions [junit4:junit4] 2> jstack at approximately timeout time [junit4:junit4] 2> "Thread-319" ID=415 RUNNABLE [junit4:junit4] 2>at org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118) [junit4:junit4] 2>at org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73) [junit4:junit4] 2>at org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70) [junit4:junit4] 2>at org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205) [junit4:junit4] 2>at org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298) [junit4:junit4] 2>at org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407) [junit4:junit4] 2>at org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330) [junit4:junit4] 2>at org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261) [junit4:junit4] 2>at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115) [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682) [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288) [junit4:junit4] 2>at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40) [junit4:junit4] 2>- locked org.apache.lucene.index.SerialMergeScheduler@4f20a4cb [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825) [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236) [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186) [junit4:junit4] 2>at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172) [junit4:junit4] 2>at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160) [junit4:junit4] 2>at org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117) [junit4:junit4] 2> [junit4:junit4] 2> "TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" ID=414 WAITING on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f [junit4:junit4] 2>at java.lang.Object.wait(Native Method) [junit4:junit4] 2>- waiting on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1203) [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1256) [junit4:junit4] 2>at org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128) [junit4:junit4] 2>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4:junit4] 2>at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [junit4:junit4] 2>at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
[jira] [Comment Edited] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization
[ https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508546#comment-13508546 ] Uwe Schindler edited comment on SOLR-4123 at 12/3/12 7:46 AM: -- The rule is simple: Never prepend /. For tests we added a special case and that may confuse here: ClasspathResourceLoader with a class in ctor param, there you can pass in the base class from which the packages are load. This is added to make writing tests easy: You can pass in a plain file name and it is loaded from the package of the corresponding class. This is to mimic what we always had in our tests: Loading local class resources: {code} // this will load the new ClassPathResourceLoader(getClass()).openResource("file.txt"); {code} Code like Solr uses FileSystemResourceLoader that wants relative path to the local working directory or uses the classloader, but thats for Solr and other applications like ElasticSearch. Tests should use ClasspathResourceLoader(getClass()) and only pass a file name fro their own package. bq. Yes, we should remove this /-stuff!!! We can do nothing here, the confusion is created by Java's API by itsself: If you call Class.getResource() without a path (only file name), it loads from same package as the class, if you prepend with "/" it uses the given path as full package name. In contrast, if you directly use the ClassLoader (not Class), you must give a full path, but without a /. was (Author: thetaphi): The rule is simple: Never prepend /. For tests we added a special case and that may confuse here: ClasspathResourceLoader with a class in ctor param, there you can pass in the base class from which the packages are load. {code} // this will load the new ClassPathResourceLoader(getClass()).openResource("file.txt"); {code} bq. Yes, we should remove this /-stuff!!! We can do nothing here, the confusion is created by Java's API by itsself: If you call Class.getResource() without a path (only file name), it loads from same package as the class, if you prepend with "/" it uses the given path as full package name. In contrast, if you directly use the ClassLoader (not Class), you must give a full path, but without a /. > ICUTokenizerFactory - per-script RBBI customization > --- > > Key: SOLR-4123 > URL: https://issues.apache.org/jira/browse/SOLR-4123 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.0 >Reporter: Shawn Heisey > Fix For: 4.1, 5.0 > > Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch > > > Initially this started out as an idea for a configuration knob on > ICUTokenizer that would allow me to tell it not to tokenize on punctuation. > Through IRC discussion on #lucene, it sorta ballooned. The committers had a > long discussion about it that I don't really understand, so I'll be including > it in the comments. > I am a Solr user, so I would also need the ability to access the > configuration from there, likely either in schema.xml or solrconfig.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508551#comment-13508551 ] Commit Tag Bot commented on LUCENE-4575: [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1416367 LUCENE-4575: add IndexWriter.setCommitData > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-4575. Resolution: Fixed Fix Version/s: 5.0 4.1 Assignee: Shai Erera Lucene Fields: New,Patch Available (was: New) Committed to trunk and 4x. Thanks Mike ! > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization
[ https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508546#comment-13508546 ] Uwe Schindler commented on SOLR-4123: - The rule is simple: Never prepend /. For tests we added a special case and that may confuse here: ClasspathResourceLoader with a class in ctor param, there you can pass in the base class from which the packages are load. {code} // this will load the new ClassPathResourceLoader(getClass()).openResource("file.txt"); {code} bq. Yes, we should remove this /-stuff!!! We can do nothing here, the confusion is created by Java's API by itsself: If you call Class.getResource() without a path (only file name), it loads from same package as the class, if you prepend with "/" it uses the given path as full package name. In contrast, if you directly use the ClassLoader (not Class), you must give a full path, but without a /. > ICUTokenizerFactory - per-script RBBI customization > --- > > Key: SOLR-4123 > URL: https://issues.apache.org/jira/browse/SOLR-4123 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.0 >Reporter: Shawn Heisey > Fix For: 4.1, 5.0 > > Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch > > > Initially this started out as an idea for a configuration knob on > ICUTokenizer that would allow me to tell it not to tokenize on punctuation. > Through IRC discussion on #lucene, it sorta ballooned. The committers had a > long discussion about it that I don't really understand, so I'll be including > it in the comments. > I am a Solr user, so I would also need the ability to access the > configuration from there, likely either in schema.xml or solrconfig.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508524#comment-13508524 ] Commit Tag Bot commented on LUCENE-4575: [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1416361 LUCENE-4575: add IndexWriter.setCommitData > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4085) Commit-free ExternalFileField
[ https://issues.apache.org/jira/browse/SOLR-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508499#comment-13508499 ] Mikhail Khludnev commented on SOLR-4085: [~jpountz], Your feedback is appreciated. To make this ticket even more valuable for community can we come through the particular points of confusing behavior, which you mention. Can you list them? I also want to wait until [~romseygeek] leaves his feedback as a person who provided and interest to the subj. Thank you, guys > Commit-free ExternalFileField > - > > Key: SOLR-4085 > URL: https://issues.apache.org/jira/browse/SOLR-4085 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.1 >Reporter: Mikhail Khludnev > Labels: externalfilefield > Attachments: SOLR-4085.patch > > > Let's reload ExternalFileFields without commit! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization
[ https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508449#comment-13508449 ] Robert Muir commented on SOLR-4123: --- Hi, thanks for tackling this! you beat me to getting to the tests. Yes, we should remove this /-stuff!!! > ICUTokenizerFactory - per-script RBBI customization > --- > > Key: SOLR-4123 > URL: https://issues.apache.org/jira/browse/SOLR-4123 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.0 >Reporter: Shawn Heisey > Fix For: 4.1, 5.0 > > Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch > > > Initially this started out as an idea for a configuration knob on > ICUTokenizer that would allow me to tell it not to tokenize on punctuation. > Through IRC discussion on #lucene, it sorta ballooned. The committers had a > long discussion about it that I don't really understand, so I'll be including > it in the comments. > I am a Solr user, so I would also need the ability to access the > configuration from there, likely either in schema.xml or solrconfig.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Pivot facets enhancements
Hi Steve - don't be discouraged by the lack of the response here. A better place to ask is actually the user list. I suspect the number of people using pivot faceting is low, so again, don't be discouraged by possibly weak response. > We first need to distribute that request, for which I've seen and locally applied the existing patch and it seems to work ok for our needs. You may want to give that JIRA issue a vote and comment saying it worked for you + any feedback you may have. > First, for any field in the facet pivot, I've added the possibility to add a query (through f.field.facet.povot.query) that would compute > the the count for the intersection between the query and the document set matching a particular value. Maybe you can provide an example in your email to user ML to help people understand if this would be useful to them or not. Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Fri, Nov 23, 2012 at 6:07 PM, Steve Molloy wrote: > Hi, I'm currently working on a project based on solr 4.0 which relies on > facet pivot to populate a treemap visualization. We've been able to get > something in place, but will now need to move further. We first need to > distribute that request, for which I've seen and locally applied the > existing patch and it seems to work ok for our needs. But while in the > code, I also added 2 features that will be helpful for us and which we > would be willing to contribute back if it makes sense. > > So before sending any code (which I need to cleanup anyhow), I'll describe > the changes. > > First, for any field in the facet pivot, I've added the possibility to add > a query (through f.field.facet.povot.query) that would compute the the > count for the intersection between the query and the document set matching > a particular value. We're planning on using this to produce the count for > the overlay coloring in the treemap. It seems to work fine and is actually > more efficient than having a third level of pivot. > > The second thing is a deduplication flag. This is mostly for our case > where the second level is a document path, which is stored using the > pathhierarchytokeniser. So to avoid documents from being counted for every > folder in its path (which we do want at query time) but not have to store a > separate field (to reduce index size). > > So, if these features are of interest, I will send more details and code > once I've cleaned it up. > > Thanks, > > Steve > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Updated] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization
[ https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated SOLR-4123: -- Attachment: SOLR-4123.patch bq. I don't understand {{Object.getClass().getResourceAsStream()}}, which is delegated to by {{ResourceLoader.loadResource()}} - even resources in the same package as the Object can't be found??? By contrast, {{Object.getClass().getClassLoader().getResourceAsStream()}} succeeds in finding resources without first prepending a {{"/"}}. The {{ClasspathResourceLoader}} ctor that allows direct specification of the {{ClassLoader}} separately from the {{Class}} has private access, though. Hmm, I just retried removing the package from the path for resource that it's in the same package as the test class, and it now works (why did I think it didn't? I thought I tried that...). Modified patch attached. So I guess {{getClass().getResourceAsStream()}} makes sense: it only searches the same package as the class unless you prepend a {{"/"}}. Should I leave in the {{"/"}}-prepending fallback? > ICUTokenizerFactory - per-script RBBI customization > --- > > Key: SOLR-4123 > URL: https://issues.apache.org/jira/browse/SOLR-4123 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.0 >Reporter: Shawn Heisey > Fix For: 4.1, 5.0 > > Attachments: SOLR-4123.patch, SOLR-4123.patch, SOLR-4123.patch > > > Initially this started out as an idea for a configuration knob on > ICUTokenizer that would allow me to tell it not to tokenize on punctuation. > Through IRC discussion on #lucene, it sorta ballooned. The committers had a > long discussion about it that I don't really understand, so I'll be including > it in the comments. > I am a Solr user, so I would also need the ability to access the > configuration from there, likely either in schema.xml or solrconfig.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization
[ https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated SOLR-4123: -- Attachment: SOLR-4123.patch Patch with more tests and example {{.rbbi}} files. I changed the {{rulefiles="..."}} arg format to relax allowable resource names & locations, e.g. {{rulefiles="Latn:, ..."}}. I added some logic to {{ICUTokenizerFactory.parseRules()}} to retry when {{ResourceLoader.loadResource()}} fails, after first prepending a {{"/"}} to the resource path, because none of the test resources under {{lucene/analysis/icu/src/test-files/}}, which is on the {{test.classpath}}, were found. I don't understand {{Object.getClass().getResourceAsStream()}}, which is delegated to by {{ResourceLoader.loadResource()}} - even resources in the same package as the Object can't be found??? By contrast, {{Object.getClass().getClassLoader().getResourceAsStream()}} succeeds in finding resources without first prepending a {{"/"}}. The {{ClasspathResourceLoader}} ctor that allows direct specification of the {{ClassLoader}} separately from the {{Class}} has private access, though. > ICUTokenizerFactory - per-script RBBI customization > --- > > Key: SOLR-4123 > URL: https://issues.apache.org/jira/browse/SOLR-4123 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.0 >Reporter: Shawn Heisey > Fix For: 4.1, 5.0 > > Attachments: SOLR-4123.patch, SOLR-4123.patch > > > Initially this started out as an idea for a configuration knob on > ICUTokenizer that would allow me to tell it not to tokenize on punctuation. > Through IRC discussion on #lucene, it sorta ballooned. The committers had a > long discussion about it that I don't really understand, so I'll be including > it in the comments. > I am a Solr user, so I would also need the ability to access the > configuration from there, likely either in schema.xml or solrconfig.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4085) Commit-free ExternalFileField
[ https://issues.apache.org/jira/browse/SOLR-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved SOLR-4085. Resolution: Won't Fix > Commit-free ExternalFileField > - > > Key: SOLR-4085 > URL: https://issues.apache.org/jira/browse/SOLR-4085 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.1 >Reporter: Mikhail Khludnev > Labels: externalfilefield > Attachments: SOLR-4085.patch > > > Let's reload ExternalFileFields without commit! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
[ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-4584. -- Resolution: Won't Fix Comparing the compressed output against the original impl seemed to be a good mean to detect bugs, but if we want to be able to have a different algorithm as Uwe suggests, I'll try to add softer tests (like checking that the algorithm manages to detect a match which is 65535 bytes backwards, gives a reasonable compression ratio on inputs that are known to be easily compressible, etc.) > Compare the LZ4 implementation in Lucene against the original impl > -- > > Key: LUCENE-4584 > URL: https://issues.apache.org/jira/browse/LUCENE-4584 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 4.1 > > > We should add tests to make sure that the LZ4 impl in Lucene compresses data > the exact same way as the original impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.6.0_37) - Build # 3016 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/3016/ Java: 64bit/jdk1.6.0_37 -XX:+UseSerialGC All tests passed Build Log: [...truncated 24159 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build.xml:284: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1526: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1560: Compile failed; see the compiler error output for details. Total time: 31 minutes 16 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.6.0_37 -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.6.0_37) - Build # 2023 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/2023/ Java: 32bit/jdk1.6.0_37 -client -XX:+UseParallelGC All tests passed Build Log: [...truncated 24165 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\build.xml:60: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:284: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\common-build.xml:1526: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-4.x-Windows\lucene\common-build.xml:1560: Compile failed; see the compiler error output for details. Total time: 57 minutes 9 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.6.0_37 -client -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4136) SolrCloud bugs when servlet context contains "/" or "_"
[ https://issues.apache.org/jira/browse/SOLR-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4136: --- Attachment: SOLR-4136.patch Context... * http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3Calpine.DEB.2.02.1211292004430.2543@frisbee%3E * http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3c551c5e62-0520-42a2-bf71-165fda360...@gmail.com%3E Mark's suggestion in that email regarding my original question (about prohibiting "/" in nodeNames) was that zkcontroller should replace them "/" with "_" -- but that would cause potential collisions between contexts like "/foo/solr" and "/foo_solr", so i think using something like URLEncoding makes more sense (and shouldn't impact existing ZK cluster state data for most existing users) The attached patch enhances the test base classes to allow for randomized hostContext values, and then uses this URLEncoding logic in ZKController to build nodeNames -- and in most cases seems to work. But thinking about "_" in paths got me paranoid about explicitly testing that which is how I discovered the crufty logic in OverseerCollectionProcessor. (NOTE: you can see the obvious OverseerCollectionProcessor errors trying to talk to the wrong URL in the test logs, and they seem to explain the subsequent test failure message, but it's also possible there is a subsequent problem i haven't noticed yet) I haven't dug into this part of the code/problem very much yet, but i *think* the right fix here is to clean this up this code so that intead of making assumptions about the node name, is uses the clusterstate to lookup the base_url from the nodeName. Logged error (repeated for multiple shards)... {noformat} [junit4:junit4] 2> 204647 T33 oasc.SolrException.log SEVERE Collection createcollection of awholynewcollection_1 failed [junit4:junit4] 2> 204686 T31 oasc.DistributedQueue$LatchChildWatcher.process Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4:junit4] 2> 204688 T33 oasc.OverseerCollectionProcessor.createCollection Create collection awholynewcollection_2 on [127.0.0.1:57855_randctxmqvf%2F_ay, 127.0.0.1:37463_randctxmqvf%2F_ay] [junit4:junit4] 2> 204691 T33 oasc.OverseerCollectionProcessor.createCollection SEVERE Error talking to shard: 127.0.0.1:37463/randctxmqvf%2F/ay org.apache.solr.common.SolrException: Server at http://127.0.0.1:37463/randctxmqvf%2F/ay returned non ok status:404, message:Not Found [junit4:junit4] 2>at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) [junit4:junit4] 2>at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) [junit4:junit4] 2>at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166) [junit4:junit4] 2>at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133) [junit4:junit4] 2>at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) {noformat} Final test failure message... {noformat} java.lang.AssertionError: Could not find new 2 slice collection called awholynewcollection_0 at __randomizedtesting.SeedInfo.seed([1BD856523B97C07C:9A3ED84A4CC8A040]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.BasicDistributedZkTest.checkForCollection(BasicDistributedZkTest.java:1053) at org.apache.solr.cloud.BasicDistributedZkTest.testCollectionsAPI(BasicDistributedZkTest.java:768) at org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:361) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:712) {noformat} > SolrCloud bugs when servlet context contains "/" or "_" > --- > > Key: SOLR-4136 > URL: https://issues.apache.org/jira/browse/SOLR-4136 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Attachments: SOLR-4136.patch > > > SolrCloud does not work properly with non-trivial values for "hostContext" > (ie: the servlet context path). In particular... > * Using a hostContext containing a "/" (ie: a servlet context with a subdir > path, semi-common among people who organize webapps hierarchically for lod > blanacer rules) is explicitly forbidden in ZkController because of how the > hostContext is used to build a ZK nodeName > * Using a hostContext containing a "_" causes problems in > OverseerCollectionProcessor where it assumes all "_" characters should be > converted to "/" to reconstitute a URL from nodeName (NOTE: this code > specifically has a TODO to fix this,
[jira] [Updated] (SOLR-4136) SolrCloud bugs when servlet context contains "/" or "_"
[ https://issues.apache.org/jira/browse/SOLR-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4136: --- Description: SolrCloud does not work properly with non-trivial values for "hostContext" (ie: the servlet context path). In particular... * Using a hostContext containing a "/" (ie: a servlet context with a subdir path, semi-common among people who organize webapps hierarchically for lod blanacer rules) is explicitly forbidden in ZkController because of how the hostContext is used to build a ZK nodeName * Using a hostContext containing a "\_" causes problems in OverseerCollectionProcessor where it assumes all "\_" characters should be converted to "/" to reconstitute a URL from nodeName (NOTE: this code specifically has a TODO to fix this, and then has a subsequent TODO about assuming "http://"; labeled "this sucks") was: SolrCloud does not work properly with non-trivial values for "hostContext" (ie: the servlet context path). In particular... * Using a hostContext containing a "/" (ie: a servlet context with a subdir path, semi-common among people who organize webapps hierarchically for lod blanacer rules) is explicitly forbidden in ZkController because of how the hostContext is used to build a ZK nodeName * Using a hostContext containing a "_" causes problems in OverseerCollectionProcessor where it assumes all "_" characters should be converted to "/" to reconstitute a URL from nodeName (NOTE: this code specifically has a TODO to fix this, and then has a subsequent TODO about assuming "http://"; labeled "this sucks") > SolrCloud bugs when servlet context contains "/" or "_" > --- > > Key: SOLR-4136 > URL: https://issues.apache.org/jira/browse/SOLR-4136 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0 >Reporter: Hoss Man >Assignee: Hoss Man > Attachments: SOLR-4136.patch > > > SolrCloud does not work properly with non-trivial values for "hostContext" > (ie: the servlet context path). In particular... > * Using a hostContext containing a "/" (ie: a servlet context with a subdir > path, semi-common among people who organize webapps hierarchically for lod > blanacer rules) is explicitly forbidden in ZkController because of how the > hostContext is used to build a ZK nodeName > * Using a hostContext containing a "\_" causes problems in > OverseerCollectionProcessor where it assumes all "\_" characters should be > converted to "/" to reconstitute a URL from nodeName (NOTE: this code > specifically has a TODO to fix this, and then has a subsequent TODO about > assuming "http://"; labeled "this sucks") -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4136) SolrCloud bugs when servlet context contains "/" or "_"
Hoss Man created SOLR-4136: -- Summary: SolrCloud bugs when servlet context contains "/" or "_" Key: SOLR-4136 URL: https://issues.apache.org/jira/browse/SOLR-4136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-4136.patch SolrCloud does not work properly with non-trivial values for "hostContext" (ie: the servlet context path). In particular... * Using a hostContext containing a "/" (ie: a servlet context with a subdir path, semi-common among people who organize webapps hierarchically for lod blanacer rules) is explicitly forbidden in ZkController because of how the hostContext is used to build a ZK nodeName * Using a hostContext containing a "_" causes problems in OverseerCollectionProcessor where it assumes all "_" characters should be converted to "/" to reconstitute a URL from nodeName (NOTE: this code specifically has a TODO to fix this, and then has a subsequent TODO about assuming "http://"; labeled "this sucks") -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_09) - Build # 3026 - Failure!
Woops, I committed a fix ... Mike McCandless http://blog.mikemccandless.com On Sun, Dec 2, 2012 at 5:53 PM, Policeman Jenkins Server wrote: > Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/3026/ > Java: 32bit/jdk1.7.0_09 -server -XX:+UseParallelGC > > All tests passed > > Build Log: > [...truncated 25027 lines...] > BUILD FAILED > /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The > following error occurred while executing this line: > /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:285: The > following error occurred while executing this line: > /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1526: > The following error occurred while executing this line: > /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1560: > Compile failed; see the compiler error output for details. > > Total time: 28 minutes 52 seconds > Build step 'Invoke Ant' marked build as failure > Archiving artifacts > Recording test results > Description set: Java: 32bit/jdk1.7.0_09 -server -XX:+UseParallelGC > Email was triggered for: Failure > Sending email for trigger: Failure > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_09) - Build # 3026 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/3026/ Java: 32bit/jdk1.7.0_09 -server -XX:+UseParallelGC All tests passed Build Log: [...truncated 25027 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:60: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:285: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1526: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1560: Compile failed; see the compiler error output for details. Total time: 28 minutes 52 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.7.0_09 -server -XX:+UseParallelGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure
You're welcome! I committed a fix to cut back on the index size when MockRandomMP is used, because this MP is O(N^2) cost! Mike McCandless http://blog.mikemccandless.com On Sun, Dec 2, 2012 at 5:00 PM, Uwe Schindler wrote: > Thanks! > > Uwe > > > > Michael McCandless schrieb: >> >> I'll dig ... >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Sun, Dec 2, 2012 at 3:10 PM, Apache Jenkins Server >> wrote: >>> >>> Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/ >>> >>> 2 tests failed. >>> REGRESSION: org.apache.lucene.index.TestBagOfPositions.test >>> >>> Error Message: >>> Test abandoned because suite timeout was reached. >>> >>> Stack Trace: >>> java.lang.Exception: Test abandoned because suite timeout was reached. >>> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) >>> >>> >>> FAILED: >>> junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions >>> >>> Error Message: >>> Suite timeout exceeded (>= 720 msec). >>> >>> Stack Trace: >>> java.lang.Exception: Suite timeout exceeded (>= 720 msec). >>> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) >>> >>> >>> >>> >>> Build Log: >>> [...truncated 1360 lines...] >>> [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions >>> [junit4:junit4] 2> 2012-12-3 2:05:06 >>> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate >>> [junit4:junit4] 2> WARNING: Suite execution timed out: >>> org.apache.lucene.index.TestBagOfPositions >>> [junit4:junit4] 2> jstack at approximately timeout time >>> [junit4:junit4] 2> "Thread-319" ID=415 RUNNABLE >>> [junit4:junit4] 2>at >>> org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118) >>> [junit4:junit4] >>> 2>at >>> org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73) >>> [junit4:junit4] 2>at >>> org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70) >>> [junit4:junit4] 2>at >>> org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205) >>> [junit4:junit4] 2>at >>> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298) >>> [junit4:junit4] 2>at >>> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407) >>> [junit4:junit4] 2>at >>> org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40) >>> [junit4:junit4] 2>- locked >>> org.apache.lucene.index.SerialMergeScheduler@4f20a4cb >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160) >>> [junit4:junit4] 2>at >>> >>> org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117) >>> [junit4:junit4] 2> >>> [junit4:junit4] 2> >>> "TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" ID=414 WAITING on >>> org.apache.lucene.index.TestBagOfPositions$1@36e5c19f >>> [junit4:junit4] 2>at java.lang.Object.wait(Native Method) >>> [junit4:junit4] 2>- waiting on >>> org.apache.lucene.index.TestBagOfPositions$1@36e5c19f >>> [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1203) >>> [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1256) >>> [junit4:junit4] 2>at >>> org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128) >>> [junit4:junit4] 2>at >>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> [junit4:junit4] 2>at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> [junit4:junit4] 2>at >>> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> [junit4:junit4] 2>at >>> java.lang.reflect.Method.invoke(Method.java:616) >>> [junit4:junit4] 2>at >>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) >>> [junit4:junit4] 2>at >
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508382#comment-13508382 ] Per Steffensen commented on SOLR-4114: -- Hope you will commit, and consider backporting to 4.x, since we expect to upgrade to 4.1 when it is released, and we would really like this feature to be included. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508379#comment-13508379 ] Per Steffensen edited comment on SOLR-4114 at 12/2/12 10:04 PM: Here is the patch for trunk (5.x). The main mistake was that you didnt used the calculated shardName as the shardName - instead you used collectionName. This caused different shards on the same node to share name and data-dir - not so cool :-) was (Author: steff1193): Here is the patch for trunk (5.x). The main mistake was the you didnt used the calculated shardName as the shardName - instead you used collectionName. This caused different shards on the same node to shard name and data-dir - not so cool :-) > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-4114: - Attachment: SOLR-4114_trunk.patch Here is the patch for trunk (5.x). The main mistake was the you didnt used the calculated shardName as the shardName - instead you used collectionName. This caused different shards on the same node to shard name and data-dir - not so cool :-) > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, > SOLR-4114_trunk.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure
Thanks! Uwe Michael McCandless schrieb: >I'll dig ... > >Mike McCandless > >http://blog.mikemccandless.com > >On Sun, Dec 2, 2012 at 3:10 PM, Apache Jenkins Server > wrote: >> Build: >https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/ >> >> 2 tests failed. >> REGRESSION: org.apache.lucene.index.TestBagOfPositions.test >> >> Error Message: >> Test abandoned because suite timeout was reached. >> >> Stack Trace: >> java.lang.Exception: Test abandoned because suite timeout was >reached. >> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) >> >> >> FAILED: >junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions >> >> Error Message: >> Suite timeout exceeded (>= 720 msec). >> >> Stack Trace: >> java.lang.Exception: Suite timeout exceeded (>= 720 msec). >> at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) >> >> >> >> >> Build Log: >> [...truncated 1360 lines...] >> [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions >> [junit4:junit4] 2> 2012-12-3 2:05:06 >com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate >> [junit4:junit4] 2> WARNING: Suite execution timed out: >org.apache.lucene.index.TestBagOfPositions >> [junit4:junit4] 2> jstack at approximately timeout time >> [junit4:junit4] 2> "Thread-319" ID=415 RUNNABLE >> [junit4:junit4] 2>at >org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118) >> [junit4:junit4] 2>at >org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73) >> [junit4:junit4] 2>at >org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70) >> [junit4:junit4] 2>at >org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205) >> [junit4:junit4] 2>at >org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298) >> [junit4:junit4] 2>at >org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407) >> [junit4:junit4] 2>at >org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330) >> [junit4:junit4] 2>at >org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261) >> [junit4:junit4] 2>at >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115) >> [junit4:junit4] 2>at >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682) >> [junit4:junit4] 2>at >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288) >> [junit4:junit4] 2>at >org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40) >> [junit4:junit4] 2>- locked >org.apache.lucene.index.SerialMergeScheduler@4f20a4cb >> [junit4:junit4] 2>at >org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825) >> [junit4:junit4] 2>at >org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236) >> [junit4:junit4] 2>at >org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186) >> [junit4:junit4] 2>at >org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172) >> [junit4:junit4] 2>at >org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160) >> [junit4:junit4] 2>at >org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117) >> [junit4:junit4] 2> >> [junit4:junit4] 2> >"TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" ID=414 WAITING >on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f >> [junit4:junit4] 2>at java.lang.Object.wait(Native Method) >> [junit4:junit4] 2>- waiting on >org.apache.lucene.index.TestBagOfPositions$1@36e5c19f >> [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1203) >> [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1256) >> [junit4:junit4] 2>at >org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128) >> [junit4:junit4] 2>at >sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> [junit4:junit4] 2>at >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> [junit4:junit4] 2>at >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> [junit4:junit4] 2>at >java.lang.reflect.Method.invoke(Method.java:616) >> [junit4:junit4] 2>at >com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) >> [junit4:junit4] 2>at >com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) >> [junit4:junit4] 2>at >com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) >> [junit4:junit4] 2>at >com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) >> [junit4:junit4] 2>at >com.carrotsearch.randomizedtesting.Randomiz
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508375#comment-13508375 ] Michael McCandless commented on LUCENE-4575: +1, looks great. Thanks Shai! > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4575: --- Attachment: LUCENE-4575.patch Patch addresses the bug that Mike reported and adds a test for it. Also adds IW.getCommitData(). > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure
I'll dig ... Mike McCandless http://blog.mikemccandless.com On Sun, Dec 2, 2012 at 3:10 PM, Apache Jenkins Server wrote: > Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/ > > 2 tests failed. > REGRESSION: org.apache.lucene.index.TestBagOfPositions.test > > Error Message: > Test abandoned because suite timeout was reached. > > Stack Trace: > java.lang.Exception: Test abandoned because suite timeout was reached. > at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) > > > FAILED: junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions > > Error Message: > Suite timeout exceeded (>= 720 msec). > > Stack Trace: > java.lang.Exception: Suite timeout exceeded (>= 720 msec). > at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) > > > > > Build Log: > [...truncated 1360 lines...] > [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions > [junit4:junit4] 2> 2012-12-3 2:05:06 > com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate > [junit4:junit4] 2> WARNING: Suite execution timed out: > org.apache.lucene.index.TestBagOfPositions > [junit4:junit4] 2> jstack at approximately timeout time > [junit4:junit4] 2> "Thread-319" ID=415 RUNNABLE > [junit4:junit4] 2>at > org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118) > [junit4:junit4] 2>at > org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73) > [junit4:junit4] 2>at > org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70) > [junit4:junit4] 2>at > org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205) > [junit4:junit4] 2>at > org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298) > [junit4:junit4] 2>at > org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407) > [junit4:junit4] 2>at > org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330) > [junit4:junit4] 2>at > org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261) > [junit4:junit4] 2>at > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115) > [junit4:junit4] 2>at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682) > [junit4:junit4] 2>at > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288) > [junit4:junit4] 2>at > org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40) > [junit4:junit4] 2>- locked > org.apache.lucene.index.SerialMergeScheduler@4f20a4cb > [junit4:junit4] 2>at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825) > [junit4:junit4] 2>at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236) > [junit4:junit4] 2>at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186) > [junit4:junit4] 2>at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172) > [junit4:junit4] 2>at > org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160) > [junit4:junit4] 2>at > org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117) > [junit4:junit4] 2> > [junit4:junit4] 2> "TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" > ID=414 WAITING on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f > [junit4:junit4] 2>at java.lang.Object.wait(Native Method) > [junit4:junit4] 2>- waiting on > org.apache.lucene.index.TestBagOfPositions$1@36e5c19f > [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1203) > [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1256) > [junit4:junit4] 2>at > org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128) > [junit4:junit4] 2>at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit4:junit4] 2>at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > [junit4:junit4] 2>at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit4:junit4] 2>at java.lang.reflect.Method.invoke(Method.java:616) > [junit4:junit4] 2>at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) > [junit4:junit4] 2>at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) > [junit4:junit4] 2>at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) > [junit4:junit4] 2>at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) > [junit4:junit4] 2>at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) > [junit4:junit4
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508363#comment-13508363 ] Shai Erera commented on LUCENE-4575: I don't think that we should add more work to finishCommit() either. Being able to setCommitData after prep() is just a bonus. It didn't work before, and it will continue to not work now. And I can't think of a good usecase for why an app would not be able to set commitData prior to prep(). If it comes up, we can discuss a solution again. At least we know that moving commitData write to finishCommit will solve it. I'll make sure the test exposes the bug you reported in IW.finishCommit(). > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508356#comment-13508356 ] Michael McCandless commented on LUCENE-4575: bq. Hmmm ... setting the commitData on pendingCommit cannot work, b/c the commitData is written to segnOutput on prepareCommit(). Oh yeah ... I forgot about that :) Hmm ... I don't think we should move writing the commit data to finishCommit? Is it really so hard for the app to provide the commit data before calling prepareCommit? > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 112 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/112/ 2 tests failed. REGRESSION: org.apache.lucene.index.TestBagOfPositions.test Error Message: Test abandoned because suite timeout was reached. Stack Trace: java.lang.Exception: Test abandoned because suite timeout was reached. at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) FAILED: junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions Error Message: Suite timeout exceeded (>= 720 msec). Stack Trace: java.lang.Exception: Suite timeout exceeded (>= 720 msec). at __randomizedtesting.SeedInfo.seed([E4E0F4496BBBE86F]:0) Build Log: [...truncated 1360 lines...] [junit4:junit4] Suite: org.apache.lucene.index.TestBagOfPositions [junit4:junit4] 2> 2012-12-3 2:05:06 com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate [junit4:junit4] 2> WARNING: Suite execution timed out: org.apache.lucene.index.TestBagOfPositions [junit4:junit4] 2> jstack at approximately timeout time [junit4:junit4] 2> "Thread-319" ID=415 RUNNABLE [junit4:junit4] 2>at org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:118) [junit4:junit4] 2>at org.apache.lucene.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:73) [junit4:junit4] 2>at org.apache.lucene.store.DataOutput.writeInt(DataOutput.java:70) [junit4:junit4] 2>at org.apache.lucene.store.DataOutput.writeLong(DataOutput.java:205) [junit4:junit4] 2>at org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.addRawDocuments(Lucene40TermVectorsWriter.java:298) [junit4:junit4] 2>at org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.copyVectorsNoDeletions(Lucene40TermVectorsWriter.java:407) [junit4:junit4] 2>at org.apache.lucene.codecs.lucene40.Lucene40TermVectorsWriter.merge(Lucene40TermVectorsWriter.java:330) [junit4:junit4] 2>at org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:261) [junit4:junit4] 2>at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115) [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3682) [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3288) [junit4:junit4] 2>at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40) [junit4:junit4] 2>- locked org.apache.lucene.index.SerialMergeScheduler@4f20a4cb [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1825) [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1236) [junit4:junit4] 2>at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1186) [junit4:junit4] 2>at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:172) [junit4:junit4] 2>at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:160) [junit4:junit4] 2>at org.apache.lucene.index.TestBagOfPositions$1.run(TestBagOfPositions.java:117) [junit4:junit4] 2> [junit4:junit4] 2> "TEST-TestBagOfPositions.test-seed#[E4E0F4496BBBE86F]" ID=414 WAITING on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f [junit4:junit4] 2>at java.lang.Object.wait(Native Method) [junit4:junit4] 2>- waiting on org.apache.lucene.index.TestBagOfPositions$1@36e5c19f [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1203) [junit4:junit4] 2>at java.lang.Thread.join(Thread.java:1256) [junit4:junit4] 2>at org.apache.lucene.index.TestBagOfPositions.test(TestBagOfPositions.java:128) [junit4:junit4] 2>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4:junit4] 2>at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [junit4:junit4] 2>at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4:junit4] 2>at java.lang.reflect.Method.invoke(Method.java:616) [junit4:junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) [junit4:junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) [junit4:junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) [junit4:junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) [junit4:junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) [junit4:junit4] 2>at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) [junit4:junit4] 2>at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) [junit4:junit4] 2>at org.apache.lucene.util.AbstractBeforeAfterRule$1.
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508327#comment-13508327 ] Shai Erera commented on LUCENE-4575: Hmmm ... setting the commitData on pendingCommit cannot work, b/c the commitData is written to segnOutput on prepareCommit(). Following commit() merely calls infos.finishCommit() which writes the checksum and closes the output. Can we modify segmentInfos.write() to not write the commitData, but move it to finishCommit()? Not sure that I like this approach, because it means that finishCommit() will do slightly more work, which increases the chance of getting an IOException during commit() after prepareCommit() successfully returned, but on the other hand it's the gains might be worth it? Being able to write commitData after you know all your document additions/deletions/updates are 'safe' might prove valuable. And finishCommit() already does I/O, writing checksum ... What do you think? > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508319#comment-13508319 ] Shai Erera commented on LUCENE-4575: The test isn't exactly accurate, because it tests a scenario that is currently not supported. I.e., after calling prepareCommit(), nothing that you do on IW will be committed. Rather, to expose the bug it should be modified as follows: {code} iw.setCommitData(data1); iw.prepareCommit(); iw.setCommitData(data2); // that will be ignored by follow-on commit iw.commit(); checkCommitData(); // will see data1 iw.commit(); // that 'should' commit data2 checkCommitData(); // that will see data1 again, because of the copy that happens in finishCommit() {code} I'll modify the test like so and include it in my next patch. > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508307#comment-13508307 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1416216 SOLR-2592: refactor doc routers, use implicit router when implicity creating collection, use collection router to find correct shard when indexing > Custom Hashing > -- > > Key: SOLR-2592 > URL: https://issues.apache.org/jira/browse/SOLR-2592 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Affects Versions: 4.0-ALPHA >Reporter: Noble Paul > Fix For: 4.1 > > Attachments: dbq_fix.patch, pluggable_sharding.patch, > pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, > SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, > SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch > > > If the data in a cloud can be partitioned on some criteria (say range, hash, > attribute value etc) It will be easy to narrow down the search to a smaller > subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4499) Multi-word synonym filter (synonym expansion)
[ https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508304#comment-13508304 ] Nolan Lawson commented on LUCENE-4499: -- @Robert: Thanks for the clarification. I've corrected my blog post. @Roman: Yes, I think it's a very common use case. Especially considering that your query expander seems to be doing the same thing as ours! My idea with the custom QueryParserPlugin was just to have a self-contained solution that didn't mess with the core Lucene/Solr logic too much. And I think it's still configurable enough that it can handle your case-insensitivity tweaks (which I totally understand - "MIT" is not the same thing as "mit"). You'd just have to have some pretty fancy XML in the "synonymAnalyzers" section. :) > Multi-word synonym filter (synonym expansion) > - > > Key: LUCENE-4499 > URL: https://issues.apache.org/jira/browse/LUCENE-4499 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: 4.1, 5.0 >Reporter: Roman Chyla >Priority: Minor > Labels: analysis, multi-word, synonyms > Fix For: 5.0 > > Attachments: LUCENE-4499.patch > > > I apologize for bringing the multi-token synonym expansion up again. There is > an old, unresolved issue at LUCENE-1622 [1] > While solving the problem for our needs [2], I discovered that the current > SolrSynonym parser (and the wonderful FTS) have almost everything to > satisfactorily handle both the query and index time synonym expansion. It > seems that people often need to use the synonym filter *slightly* differently > at indexing and query time. > In our case, we must do different things during indexing and querying. > Example sentence: Mirrors of the Hubble space telescope pointed at XA5 > This is what we need (comma marks position bump): > indexing: mirrors,hubble|hubble space > telescope|hst,space,telescope,pointed,xa5|astroobject#5 > querying: +mirrors +(hubble space telescope | hst) +pointed > +(xa5|astroboject#5) > This translated to following needs: > indexing time: > single-token synonyms => return only synonyms > multi-token synonyms => return original tokens *AND* the synonyms > query time: > single-token: return only synonyms (but preserve case) > multi-token: return only synonyms > > We need the original tokens for the proximity queries, if we indexed 'hubble > space telescope' > as one token, we cannot search for 'hubble NEAR telescope' > You may (not) be surprised, but Lucene already supports ALL of these > requirements. The patch is an attempt to state the problem differently. I am > not sure if it is the best option, however it works perfectly for our needs > and it seems it could work for general public too. Especially if the > SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and > people would just choose what situation they use. Please look at the unittest. > links: > [1] https://issues.apache.org/jira/browse/LUCENE-1622 > [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158 > [3] seems to have similar request: > http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4575: --- Attachment: LUCENE-4575-testcase.patch Simple test showing that commit data is lost ... I didn't need to use threads; just call .setCommitData after prepareCommit and before commit. > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch, > LUCENE-4575-testcase.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
[ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508292#comment-13508292 ] Uwe Schindler commented on LUCENE-4584: --- I agree with Robert here. We don't need to test random data, for Lucene only 2 things are important: - When you compress random data and decompress it again, the same exact bytes must come back. This should be tested and needs no external C code. This is the doesn't corrumpt™ Robert is talking about. - The compressed content should never get significantly bigger There is no reason at all that Lucene's LZ4 returns the same compressed output. E.g. if we find a better algorithm that performs better in Hotspot, although it compresses to a different byte array, we are perfectly fine. If we want to assert for now that both algorithms create the same compressed output, we should have three different size random byte files (e.g. generated by /dev/urandom) as test resources and the C-compressed ones also as test resources, and then we can compare the results. We should just document how the test data was created. But keep in mind: We may change the algorithm to produce different bytes, so this is not mandatory. I think we may only assert that the compression percentage of the random data is identical, not the actual bytes. > Compare the LZ4 implementation in Lucene against the original impl > -- > > Key: LUCENE-4584 > URL: https://issues.apache.org/jira/browse/LUCENE-4584 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 4.1 > > > We should add tests to make sure that the LZ4 impl in Lucene compresses data > the exact same way as the original impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508288#comment-13508288 ] Mark Miller commented on SOLR-4114: --- Should be against 5x - I'm going to US west coast for a week - so not sure when I'll get back to this - I may try and get it going while I'm out there and I may not have time till I get back. > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508287#comment-13508287 ] Per Steffensen commented on SOLR-4114: -- Where does your patch fit, Mark? > Collection API: Allow multiple shards from one collection on the same Solr > server > - > > Key: SOLR-4114 > URL: https://issues.apache.org/jira/browse/SOLR-4114 > Project: Solr > Issue Type: New Feature > Components: multicore, SolrCloud >Affects Versions: 4.0 > Environment: Solr 4.0.0 release >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: collection-api, multicore, shard, shard-allocation > Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch > > > We should support running multiple shards from one collection on the same > Solr server - the run a collection with 8 shards on a 4 Solr server cluster > (each Solr server running 2 shards). > Performance tests at our side has shown that this is a good idea, and it is > also a good idea for easy elasticity later on - it is much easier to move an > entire existing shards from one Solr server to another one that just joined > the cluter than it is to split an exsiting shard among the Solr that used to > run it and the new Solr. > See dev mailing list discussion "Multiple shards for one collection on the > same Solr server" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508284#comment-13508284 ] Robert Muir commented on LUCENE-4583: - The most important thing: if this implementation (or if we decide dv itself) should be limited, then it should check this at index-time and throw a useful exception. > StraightBytesDocValuesField fails if bytes > 32k > > > Key: LUCENE-4583 > URL: https://issues.apache.org/jira/browse/LUCENE-4583 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 4.0, 4.1, 5.0 >Reporter: David Smiley >Priority: Critical > > I didn't observe any limitations on the size of a bytes based DocValues field > value in the docs. It appears that the limit is 32k, although I didn't get > any friendly error telling me that was the limit. 32k is kind of small IMO; > I suspect this limit is unintended and as such is a bug.The following > test fails: > {code:java} > public void testBigDocValue() throws IOException { > Directory dir = newDirectory(); > IndexWriter writer = new IndexWriter(dir, writerConfig(false)); > Document doc = new Document(); > BytesRef bytes = new BytesRef((4+4)*4097);//4096 works > bytes.length = bytes.bytes.length;//byte data doesn't matter > doc.add(new StraightBytesDocValuesField("dvField", bytes)); > writer.addDocument(doc); > writer.commit(); > writer.close(); > DirectoryReader reader = DirectoryReader.open(dir); > DocValues docValues = MultiDocValues.getDocValues(reader, "dvField"); > //FAILS IF BYTES IS BIG! > docValues.getSource().getBytes(0, bytes); > reader.close(); > dir.close(); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508275#comment-13508275 ] Michael McCandless commented on LUCENE-4575: I'll make a test exposing the bug ... > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
[ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508276#comment-13508276 ] Adrien Grand commented on LUCENE-4584: -- bq. I only care that it compresses well, is reasonably fast, and doesn't corrupt. Right, the issue is probably badly named. The reason why I want to compare against the original impl is exacly for the reasons you mention: making sure that our impl compresses well and trying to find bugs in it. > Compare the LZ4 implementation in Lucene against the original impl > -- > > Key: LUCENE-4584 > URL: https://issues.apache.org/jira/browse/LUCENE-4584 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 4.1 > > > We should add tests to make sure that the LZ4 impl in Lucene compresses data > the exact same way as the original impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
[ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508270#comment-13508270 ] Adrien Grand commented on LUCENE-4584: -- bq. You wouldn't need static files if you compared output lengths Even the output length depends on the endianess: LZ4 uses a hash table without collision resolution (it maps hash -> last offset that produced this hash) to find matchs of 4 consecutive bytes in the input bytes, and this hash function is not endian-neutral (it interprets the 4 bytes as an 32-bits int, multiplies it by a prime number and keeps the 12 first bits (13 if there are less than 2^16 input bytes)), so the collisions won't be the same depending on the endianess and LZ4 won't find the same matchs. > Compare the LZ4 implementation in Lucene against the original impl > -- > > Key: LUCENE-4584 > URL: https://issues.apache.org/jira/browse/LUCENE-4584 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 4.1 > > > We should add tests to make sure that the LZ4 impl in Lucene compresses data > the exact same way as the original impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
[ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4584: Priority: Major (was: Blocker) > Compare the LZ4 implementation in Lucene against the original impl > -- > > Key: LUCENE-4584 > URL: https://issues.apache.org/jira/browse/LUCENE-4584 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand > Fix For: 4.1 > > > We should add tests to make sure that the LZ4 impl in Lucene compresses data > the exact same way as the original impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
[ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508269#comment-13508269 ] Robert Muir commented on LUCENE-4584: - I'm confused why this is a blocker at all: I'm going to unset it. I don't actually care if our LZ4 is conformant to the original impl. I only care that it compresses well, is reasonably fast, and doesn't corrumpt. > Compare the LZ4 implementation in Lucene against the original impl > -- > > Key: LUCENE-4584 > URL: https://issues.apache.org/jira/browse/LUCENE-4584 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Blocker > Fix For: 4.1 > > > We should add tests to make sure that the LZ4 impl in Lucene compresses data > the exact same way as the original impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508268#comment-13508268 ] Shai Erera commented on LUCENE-4575: I'll make the changes, and also it seems like you were suggesting that earlier -- allow setCommitData to affect the pendingCommit too. I think that's valuable because you can e.g. call prerCommit() -> setCommitData() -> commit() -- the setCD() in the middle lets you create a commitData that will pertain to the state of the index after the commit. I'll make all the changes and post a new patch, probably tomorrow. > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508267#comment-13508267 ] Michael McCandless commented on LUCENE-4575: {quote} I agree that calling that setCommitData in finishCommit is redundant, but perhaps we can solve it more elegantly by either: # Not storing the setCommitData in infos, but rather in a private IW member. Then in startCommit set it on the cloned infos. It's essentially how it's done today, only now the commit data will be copied from a member. # Stick w/ current API commit(commitData) and prepareCommit(commitData), and just make sure that commit goes through even if changeCount == previousChangeCount, but commitUserData != null. {quote} Hmm, I'd rather not store the member inside IW *and* inside SIS; just seems safer to have a single clear place where this is tracked. Also, I like the new API so I'd rather not do #2? I think just removing that line in finishCommit should fix the bug ... but first we need a test exposing it. bq. I think that the code in finishCommit ensures that we can always pull the commitData from segmentInfos? Yes. > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508254#comment-13508254 ] Shai Erera commented on LUCENE-4575: bq. I thought we were going to rename ensureOpen's confusing boolean param? Right, but for some reason I thought that you're going to do that :). I'll do it in the next patch. bq. IW.setCommitData should be sync'd I think, eg to ensure visibility across threads of the changes to sis.userData? Ok bq. Hmm ... I think there's a thread hazard here, during commit I think you're right. Not sure how practical, because I believe that usually the commit thread will also be the one that calls setCommitData, but it is possible. I agree that calling that setCommitData in finishCommit is redundant, but perhaps we can solve it more elegantly by either: # Not storing the setCommitData in infos, but rather in a private IW member. Then in startCommit set it on the cloned infos. It's essentially how it's done today, only now the commit data will be copied from a member. # Stick w/ current API commit(commitData) and prepareCommit(commitData), and just make sure that commit goes through even if changeCount == previousChangeCount, but commitUserData != null. Option #2 means that there's no API break, no synchronization is needed on setCommitData and practically everything remains the same. We can still remove the redundant .setCommitData in finishCommit regadless. bq. should we add an IW.getCommitData? I think that that'd be great ! Today the only way to do it is if you refresh a reader (expensive). I think that the code in finishCommit ensures that we can always pull the commitData from segmentInfos? > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508251#comment-13508251 ] Michael McCandless commented on LUCENE-4575: Actually I think we should just remove that .setUserData inside finishCommit? Also, should we add an IW.getCommitData? > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508245#comment-13508245 ] Michael McCandless commented on LUCENE-4575: I thought we were going to rename ensureOpen's confusing boolean param? IW.setCommitData should be sync'd I think, eg to ensure visibility across threads of the changes to sis.userData? Hmm ... I think there's a thread hazard here, during commit; I think if pendingCommit is not null you should also call pendingCommit.setUserData? Else, a commit can finish and "undo" the user's change to the commit data (see finishCommit, where it calls .setUserData). Maybe we need a thread safety test here ... > Allow IndexWriter to commit, even just commitData > - > > Key: LUCENE-4575 > URL: https://issues.apache.org/jira/browse/LUCENE-4575 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Priority: Minor > Attachments: LUCENE-4575.patch, LUCENE-4575.patch > > > Spinoff from here > http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. > In some cases, it is valuable to be able to commit changes to the index, even > if the changes are just commitData. Such data is sometimes used by > applications to register in the index some global application > information/state. > The proposal is: > * Add a setCommitData() API and separate it from commit() and prepareCommit() > (simplify their API) > * When that API is called, flip on the dirty/changes bit, so that this gets > committed even if no other changes were made to the index. > I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4580) Facet DrillDown should return a ConstantScoreQuery
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4580: --- Summary: Facet DrillDown should return a ConstantScoreQuery (was: Facet DrillDown should return a Filter not Query) > Facet DrillDown should return a ConstantScoreQuery > -- > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508243#comment-13508243 ] Uwe Schindler commented on LUCENE-4580: --- OK. I would add a test that verifies that the scores dont change... > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508242#comment-13508242 ] Shai Erera commented on LUCENE-4580: It seems then that the only thing that needs to be done here is fix the {{query()}} code to return CSQ (and set the coord and boost properly). The API today doesn't support disjunction between categories, but it is doable with a combination of term() and query() calls, so rather than adding more API, I say that we leave it simple. If you agree, I'll rename this issue and fix DrillDown. > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508241#comment-13508241 ] Uwe Schindler commented on LUCENE-4580: --- bq. If a Filter is not cached, how efficient is using TermsFilter(oneTerm) vs. CSQ(TermQuery)? Are we talking huge gains here? If not, let's keep the API simple. DrillDown offers the terms() API too, so one can construct BooleanFilter, TermsFilter and whatever he wants out of them. CSQ(TermQuery)) is way faster, as it can leap-frog. TermsFilter with one term will allocate a Bitset and then mark all positings in it; also those postings which are not needed (this depends on the FilteredQuery mode, which is used to apply filters). CSQ(TermQuery) will leap-frog so the original query and the single TermQuery will advance each other and lead to fastest execution, while the TermsFilter prepares a bitset before the query execution, so the latency will be bigger (2 steps). > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508240#comment-13508240 ] Uwe Schindler commented on LUCENE-4580: --- Hi, In general I would prefer another approach for the whole thing. We should not make the users decide if then need to use a Filter or Query or whatever drill down approach. The user API should only use Query: Query in, Query out: {code:java} Query drilldown(Query originalQuery, CategoryPath... categories) {code} This would get the user query to drill down as input and return a new Query with the same scoring, but somehow filtered. Internally this method can use a Filter or Query or whatever to do the drill down, the user does not need to think about. It should just add 2 options: conjunction or disjunction. The following possibilities are available: - one or more category path, conjunction: returns new BooleanQuery(true) [no coord], consisting of the original Query as clause and multiple CSQ(TermQuery(category)) with boost=0.0 (boost=0 means the BQ does not get any value from the filter clause and with disableCoord=true nothing changes) - more than one category path, disjunction between categories: return FilteredQuery(originalQuery, new TermsFilter(terms)) > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508239#comment-13508239 ] Shai Erera commented on LUCENE-4580: Uwe, the thinking that I had about Filter is that if you e.g. wrap it w/ CWF, then you pay that cost once, and that's it. Therefore BooleanFilter is just used as a means to create a more complicated Filter. But, I'm not sure that I want to over-complicate DrillDown API. So perhaps this is what we do: * Fix DrillDown to always return CSQ, irregardless of the case. * Document that for caching purposes, one can wrap the returned Query with CachingWrapperFilter(QueryWrapperFilter(Query)) If a Filter is not cached, how efficient is using TermsFilter(oneTerm) vs. CSQ(TermQuery)? Are we talking huge gains here? If not, let's keep the API simple. DrillDown offers the terms() API too, so one can construct BooleanFilter, TermsFilter and whatever he wants out of them. > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508215#comment-13508215 ] Uwe Schindler commented on LUCENE-4580: --- bq. This is exactly what I proposed. I'm +1 for it (and BooleanFilter). -1, BooleanFilter is horrible and slow for this use-case. > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508211#comment-13508211 ] Uwe Schindler commented on LUCENE-4580: --- bq. Or better, move TermsFilter and BooleanFilter to core – why are they treated differently than TermQuery and BooleanQuery? Especially now that Filter is applied more efficiently, I suspect more people will want to use it? TermsFilter - yes. See my comment above. We already have a very good Automaton-based one in test-framework that also needs to be moved to core (as a MTQ rewrite method). BUT, about BooleanFilter: This class is horrible ineffective, inconsistent, and not good for drill downs (you should use it only when you want to do caching of filters with bitsets). If you use it for those type of queries you pay the price of allocating bitsets, iterate the wrapped queries/filters completely instead of advanceing the underlying scorers (leap-frogging). So for drilldowns BooleanFilter is the worst you can do! The way to go from my opinion is to use constant score queries (like Solr does). In addition we recently reopened / discussed again the very old issue to nuke Filters at all and just provide queries and nothing more. Filters are nothing more than constant score queries > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508202#comment-13508202 ] Gilad Barkai commented on LUCENE-4580: -- bq. it should be a combination of TermsFilter and BooleanFilter. So in fact if we want to keep DrillDown behave like today, we should use BooleanFilter and TermsFilter. +1 > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508201#comment-13508201 ] Shai Erera commented on LUCENE-4580: bq. In my opinion, for Lucene 4.x we should move the TermsFilter to core. This is exactly what I proposed. I'm +1 for it (and BooleanFilter). bq. TermsFilter is a Disjunction, but for drill downs you generally need Conjunctions You're right, it should be a combination of TermsFilter and BooleanFilter. So in fact if we want to keep DrillDown behave like today, we should use BooleanFilter and TermsFilter. > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508200#comment-13508200 ] Shai Erera commented on LUCENE-4580: Not that I'm against adding dependencies between modules, but just to give some data points: * The queries module is not a MUST for every search application (let alone faceted search). The basic query components are in core already (Filter, Query, TermQuery, BooleanQuery etc.). I found the queries module useful (so far) for the {{BooleanFilter}} and {{TermsFilter}} classes. * A question was recently asked on the user list how to make {{DrillDown}} create OR queries instead of AND. The scenario -- you have a facet dimension for which you would like to allow people to select multiple values, and OR them (while still AND-ing with other dimensions). Since {{DrillDown}} doesn't have that option, I offered the user to use DrillDown.term() and construct his own BooleanQuery. ** My point is that {{DrillDown}} is a helper class that doesn't cover all cases already. Even if we make it return a Filter, that user will still need to construct BooleanFilter doing several API calls. ** So I'm ok if it only exposes terms(), but I'm also ok if we add the queries dependency and just make the cut over to Filter instead of Query. ** Or better, move TermsFilter and BooleanFilter to core -- why are they treated differently than TermQuery and BooleanQuery? Especially now that Filter is applied more efficiently, I suspect more people will want to use it? * I am all for usability, but {{TermsFilter}} is not like {{BooleanQuery}} in the sense that it's very easy to create it (just one line of code). I'm not sure that if BooleanQuery had a ctor which accepts {{List}}, we wouldn't have used it in {{DrillDown}}, or if we even create the DrillDown.query API. So the 'same code over and over' is not comparable between the two cases, I think. > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508199#comment-13508199 ] Uwe Schindler commented on LUCENE-4580: --- In my opinion, for Lucene 4.x we should move the TermsFilter to core. This filter is very often used and we already have a good Automaton-based variant (DahizukMihov) filter that performs very well on lots of terms! On the other hand: TermsFilter is a Disjunction, but for drill downs you generally need Conjunctions? > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4580) Facet DrillDown should return a Filter not Query
[ https://issues.apache.org/jira/browse/LUCENE-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508195#comment-13508195 ] Gilad Barkai commented on LUCENE-4580: -- {{DrillDown}} is a useful class with a straight-forward API, which makes the life of basic users simpler. As Shai pointed out, today there is no dependency on the Queries module, but the code contains a hidden bug in which a 'drill down' operation may change the score of the results. And adding a Filter or a {{ConstantScoreQuery}} looks the right way to go. That sort of a fix is possible, while keeping the usefulness of the DrillDown class, only if the code becomes dependent on the queries module. On the other hand, removing the dependency would force most faceted users to write that exact extra code as mentioned. Preventing such cases was the reason that utility class was created. 'Drilling Down' is a basic feature of a faceted search application, and the DrillDown class provides an easy way of invoking it. Having a faceted search application without utilizing the queries module (e.g filtering) seems remote - is there any such scenario? Module dependency may result with a user loading jars he does not need or care about, but the queries module jar is likely to be found on any faceted search application. Modules should be independent, but I see enough gain in here. It would not bother me if the faceted module would depend on the query module. I find it logical. -1 for forcing users to write same code over and over to keep facet module independent of the queries module +1 for adding {{DrillDown.filter(CategoryPath...)}} - That looks like the way to go > Facet DrillDown should return a Filter not Query > > > Key: LUCENE-4580 > URL: https://issues.apache.org/jira/browse/LUCENE-4580 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Priority: Minor > > DrillDown is a helper class which the user can use to convert a facet value > that a user selected into a Query for performing drill-down or narrowing the > results. The API has several static methods that create e.g. a Term or Query. > Rather than creating a Query, it would make more sense to create a Filter I > think. In most cases, the clicked facets should not affect the scoring of > documents. Anyway, even if it turns out that it must return a Query (which I > doubt), we should at least modify the impl to return a ConstantScoreQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
[ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508191#comment-13508191 ] Uwe Schindler commented on LUCENE-4584: --- bq. cpp-tasks is used to compile NativePosixUtil.cpp, so there is precedent for this in our project... -1. THIS IS NOT PART OF OUR BUILD SYSTEM; IT IS NOT EVEN TESTED AT ALL! > Compare the LZ4 implementation in Lucene against the original impl > -- > > Key: LUCENE-4584 > URL: https://issues.apache.org/jira/browse/LUCENE-4584 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Blocker > Fix For: 4.1 > > > We should add tests to make sure that the LZ4 impl in Lucene compresses data > the exact same way as the original impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4584) Compare the LZ4 implementation in Lucene against the original impl
[ https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508191#comment-13508191 ] Uwe Schindler edited comment on LUCENE-4584 at 12/2/12 9:10 AM: bq. cpp-tasks is used to compile NativePosixUtil.cpp, so there is precedent for this in our project... -1. THIS IS NOT PART OF OUR (OFFICIALLY SUPPORTED) BUILD SYSTEM; IT IS NOT EVEN TESTED AT ALL! was (Author: thetaphi): bq. cpp-tasks is used to compile NativePosixUtil.cpp, so there is precedent for this in our project... -1. THIS IS NOT PART OF OUR BUILD SYSTEM; IT IS NOT EVEN TESTED AT ALL! > Compare the LZ4 implementation in Lucene against the original impl > -- > > Key: LUCENE-4584 > URL: https://issues.apache.org/jira/browse/LUCENE-4584 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Blocker > Fix For: 4.1 > > > We should add tests to make sure that the LZ4 impl in Lucene compresses data > the exact same way as the original impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org