[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2657: Attachment: LUCENE-2657-branch_3x.patch LUCENE-2657.patch Patches implementing my proposal to place the Maven POMs in {{dev-tools/maven/}} and add a new top-level Ant target {{get-maven-poms}}, which is invoked by {{generate-maven-artifacts}}. {{generate-maven-artifacts}} remains in the top-level {{build.xml}}, as well as in {{lucene/}}, {{solr/}}, and {{modules/}} (trunk only). I couldn't figure out a way for {{generate-maven-artifacts}} under child directories {{lucene/}}, {{solr/}} and {{modules/}} to depend on the top-level {{get-maven-poms}} target, so I instead have {{generate-maven-artifacts}} in the child directories explicitly run the {{get-maven-poms}} target via the {{ant}} task. As a result, running {{generate-maven-artifacts}} from the top level will cause {{get-maven-poms}} to be run once for each child directory, but the repeated copy operation doesn't hurt anything, and the process is quick. Unless there are objections, I will commit this tomorrow. Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657-branch_3x.patch, LUCENE-2657-branch_3x.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1996) Possible edismax phrase query bug with query parametr like: q=(aaa+bbb)+OR+otherField:(zzz)^30
[ https://issues.apache.org/jira/browse/SOLR-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984127#action_12984127 ] Otis Gospodnetic commented on SOLR-1996: Rafał, was that the case? If so, we can close this. Possible edismax phrase query bug with query parametr like: q=(aaa+bbb)+OR+otherField:(zzz)^30 -- Key: SOLR-1996 URL: https://issues.apache.org/jira/browse/SOLR-1996 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.4, 1.4.1 Environment: Ubuntu 10, Java 5 - 6 Reporter: Rafał Kuć I think there is a problem with edismax query parser. When I try to use pf parameter with query parameter defined like this: q=(aaa bbb)+OR+field:(aaa bbb)^100 the pf parametr is not working - with debug turned on i see some strange phrase query as a part of lucene raw query. Of course when I set the query parametr to something like: q=(aaa bbb) without the 'OR' part the phrase boost is working perfectly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1996) Possible edismax phrase query bug with query parametr like: q=(aaa+bbb)+OR+otherField:(zzz)^30
[ https://issues.apache.org/jira/browse/SOLR-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984131#action_12984131 ] Rafał Kuć commented on SOLR-1996: - Yes - I forgot about the issue - please close it ;) Possible edismax phrase query bug with query parametr like: q=(aaa+bbb)+OR+otherField:(zzz)^30 -- Key: SOLR-1996 URL: https://issues.apache.org/jira/browse/SOLR-1996 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.4, 1.4.1 Environment: Ubuntu 10, Java 5 - 6 Reporter: Rafał Kuć I think there is a problem with edismax query parser. When I try to use pf parameter with query parameter defined like this: q=(aaa bbb)+OR+field:(aaa bbb)^100 the pf parametr is not working - with debug turned on i see some strange phrase query as a part of lucene raw query. Of course when I set the query parametr to something like: q=(aaa bbb) without the 'OR' part the phrase boost is working perfectly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984134#action_12984134 ] Shay Banon commented on LUCENE-2871: Strange, did not get it when running the tests, will try and find out why it can happen. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984139#action_12984139 ] Michael McCandless commented on LUCENE-2871: Yeah, me neither -- tests all pass when I force dir to eg NIOFSDir, and, my benchmark runs on the 100K index; just fails for the 10M index... curious. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984145#action_12984145 ] Uwe Schindler commented on LUCENE-2871: --- Looking at the current patch, the class seems wrong. In my opinion, this should be only in NIOFSDirectory. SimpleFSDir should only use RAF. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Query parser contract changes?
On Tue, Jan 18, 2011 at 3:58 AM, karl.wri...@nokia.com wrote: This turns out to have indeed been due to a recent, but un-announced, index format change. A rebuilt index worked properly. This was from LUCENE-2862, i'll send out an email now - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
heads up (late)
5 days ago the trunk index format changed with LUCENE-2862, you should re-index any trunk indexes. its likely if you open up old trunk indexes, you won't get an exception, just that queries will return zero results. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Query parser contract changes?
On Thu, Jan 20, 2011 at 6:59 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Jan 18, 2011 at 3:58 AM, karl.wri...@nokia.com wrote: This turns out to have indeed been due to a recent, but un-announced, index format change. A rebuilt index worked properly. This was from LUCENE-2862, i'll send out an email now Ugh! Mea culpa. Sorry :( I forgot this was an index change! Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()
[ https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984148#action_12984148 ] Robert Muir commented on LUCENE-2876: - I'd like to commit this later today if there aren't any objections (its just boring cleanup). As far as 3.1, i rethought the issue, and i think e.g. DisjunctionMaxQuery should really work with LUCENE-2590. So i'll look at fixing trying to pass non-null weight in 3.x too, i think users will see it as a bug... Remove Scorer.getSimilarity() - Key: LUCENE-2876 URL: https://issues.apache.org/jira/browse/LUCENE-2876 Project: Lucene - Java Issue Type: Task Components: Query/Scoring Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2876.patch Originally this was part of the patch for per-field Similarity (LUCENE-2236), but I pulled it out here as its own issue as its really mostly unrelated. I also like it as a separate issue to apply the deprecation to branch_3x to just make less surprises/migration hassles for 4.0 users. Currently Scorer takes a confusing number of ctors, either a Similarity, or a Weight + Similarity. Also, lots of scorers don't use the Similarity at all, and its not really needed in Scorer itself. Additionally, the Weight argument is often null. The Weight makes sense to be here in Scorer, its the parent that created the scorer, and used by Scorer itself to support LUCENE-2590's features. But I dont think all queries work with this feature correctly right now, because they pass null. Finally the situation gets confusing if you start to consider delegators like ScoreCachingWrapperScorer, which arent really delegating correctly so I'm unsure features like LUCENE-2590 aren't working with this. So I think we should remove the getSimilarity, if your scorer uses a Similarity its already coming to you via your ctor from your Weight and you can manage this yourself. Also, all scorers should pass the Weight (parent) that created them, and this should be Scorer's only ctor. I fixed all core/contrib/solr Scorers (even the internal ones) to pass their parent Weight, just for consistency of this visitor interface. The only one that passes null is Solr's ValueSourceScorer. I set fix-for 3.1, not because i want to backport anything, only to mark the getSimilarity deprecated there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()
[ https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984171#action_12984171 ] Doron Cohen commented on LUCENE-2876: - Patch looks good. +1 for this cleanup which removes calls with null arg and comment {//similarity not in use}. Some minor comments: * jdoc: in some of the scorer constructors a Weight param was added but existing jdocs for the costructor which document (some) params was not updated to also mention the weight. I am not 100% sure this should be fixed, as there is inconsistency in the level of jdoc for scorer implementations. So if there were no jdocs at all there I would say nothing, but since there were some now they became less complete... * ExactPhraseScorer is created with both Weight and Similarity - I think the Similarity param can be removed as part of this cleanup. * Same for SloppyPhraseScorer, PhraseScorer, SpanScorer, TermScorer, MatchAllDocsScorer - Similarity param can be removed. One question not related to this patch - just saw it reviewing: * it is interesting that SloppyPhraseScorer now extends PhraseScorer but ExactPhraseScorer does not, is this on purpose? Perhaps related do Mike's recent optimizations in this scorer? Remove Scorer.getSimilarity() - Key: LUCENE-2876 URL: https://issues.apache.org/jira/browse/LUCENE-2876 Project: Lucene - Java Issue Type: Task Components: Query/Scoring Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2876.patch Originally this was part of the patch for per-field Similarity (LUCENE-2236), but I pulled it out here as its own issue as its really mostly unrelated. I also like it as a separate issue to apply the deprecation to branch_3x to just make less surprises/migration hassles for 4.0 users. Currently Scorer takes a confusing number of ctors, either a Similarity, or a Weight + Similarity. Also, lots of scorers don't use the Similarity at all, and its not really needed in Scorer itself. Additionally, the Weight argument is often null. The Weight makes sense to be here in Scorer, its the parent that created the scorer, and used by Scorer itself to support LUCENE-2590's features. But I dont think all queries work with this feature correctly right now, because they pass null. Finally the situation gets confusing if you start to consider delegators like ScoreCachingWrapperScorer, which arent really delegating correctly so I'm unsure features like LUCENE-2590 aren't working with this. So I think we should remove the getSimilarity, if your scorer uses a Similarity its already coming to you via your ctor from your Weight and you can manage this yourself. Also, all scorers should pass the Weight (parent) that created them, and this should be Scorer's only ctor. I fixed all core/contrib/solr Scorers (even the internal ones) to pass their parent Weight, just for consistency of this visitor interface. The only one that passes null is Solr's ValueSourceScorer. I set fix-for 3.1, not because i want to backport anything, only to mark the getSimilarity deprecated there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 3949 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3949/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple Error Message: expected:3 but was:2 Stack Trace: junit.framework.AssertionFailedError: expected:3 but was:2 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059) at org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple(TestLBHttpSolrServer.java:126) Build Log (for compile errors): [...truncated 8226 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()
[ https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984177#action_12984177 ] Robert Muir commented on LUCENE-2876: - Thanks for the review Doron! bq. jdoc: in some of the scorer constructors a Weight param was added but existing jdocs for the costructor which document (some) params was not updated to also mention the weight. I am not 100% sure this should be fixed, as there is inconsistency in the level of jdoc for scorer implementations. So if there were no jdocs at all there I would say nothing, but since there were some now they became less complete... I'll fix this, thanks! {quote} ExactPhraseScorer is created with both Weight and Similarity - I think the Similarity param can be removed as part of this cleanup. Same for SloppyPhraseScorer, PhraseScorer, SpanScorer, TermScorer, MatchAllDocsScorer - Similarity param can be removed. {quote} These still need the Similarity param? They use it in scoring, its just they don't pass it to the superclass constructor (Scorer's constructor). Its possible I misunderstood your idea though. Lets take the TermQuery example, are you suggesting that we should expose TermWeight's Similarity and just pass TermWeight to TermScorer (requiring TermScorer to take a TermWeight in its ctor instead of Weight + Similarity?) Currently TermWeight's local copy of Similarity, which it uses to compute IDF, is private. bq. it is interesting that SloppyPhraseScorer now extends PhraseScorer but ExactPhraseScorer does not, is this on purpose? Perhaps related do Mike's recent optimizations in this scorer? Yes, that's correct. Just at a glance he might have done this so that ExactPhraseScorer can compute a score cache like TermScorer, and other similar optimizations since the tf values are really integers for this exact case. It might be that if we look at splitting calculations and matching out from Scorer, that we can make these matchers like ExactPhrase/SloppyPhrase simpler, and we could then clean up... not sure though! Remove Scorer.getSimilarity() - Key: LUCENE-2876 URL: https://issues.apache.org/jira/browse/LUCENE-2876 Project: Lucene - Java Issue Type: Task Components: Query/Scoring Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2876.patch Originally this was part of the patch for per-field Similarity (LUCENE-2236), but I pulled it out here as its own issue as its really mostly unrelated. I also like it as a separate issue to apply the deprecation to branch_3x to just make less surprises/migration hassles for 4.0 users. Currently Scorer takes a confusing number of ctors, either a Similarity, or a Weight + Similarity. Also, lots of scorers don't use the Similarity at all, and its not really needed in Scorer itself. Additionally, the Weight argument is often null. The Weight makes sense to be here in Scorer, its the parent that created the scorer, and used by Scorer itself to support LUCENE-2590's features. But I dont think all queries work with this feature correctly right now, because they pass null. Finally the situation gets confusing if you start to consider delegators like ScoreCachingWrapperScorer, which arent really delegating correctly so I'm unsure features like LUCENE-2590 aren't working with this. So I think we should remove the getSimilarity, if your scorer uses a Similarity its already coming to you via your ctor from your Weight and you can manage this yourself. Also, all scorers should pass the Weight (parent) that created them, and this should be Scorer's only ctor. I fixed all core/contrib/solr Scorers (even the internal ones) to pass their parent Weight, just for consistency of this visitor interface. The only one that passes null is Solr's ValueSourceScorer. I set fix-for 3.1, not because i want to backport anything, only to mark the getSimilarity deprecated there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()
[ https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984186#action_12984186 ] Doron Cohen commented on LUCENE-2876: - {quote} These still need the Similarity param? They use it in scoring, its just they don't pass it to the superclass constructor (Scorer's constructor). ... Currently TermWeight's local copy of Similarity, which it uses to compute IDF, is private. {quote} You're right, I was for some reason under the impression that part of the reason for the change is that Weight already exposes Similarity but it is not, and I think it shouldn't, so current patch is good here. Remove Scorer.getSimilarity() - Key: LUCENE-2876 URL: https://issues.apache.org/jira/browse/LUCENE-2876 Project: Lucene - Java Issue Type: Task Components: Query/Scoring Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2876.patch Originally this was part of the patch for per-field Similarity (LUCENE-2236), but I pulled it out here as its own issue as its really mostly unrelated. I also like it as a separate issue to apply the deprecation to branch_3x to just make less surprises/migration hassles for 4.0 users. Currently Scorer takes a confusing number of ctors, either a Similarity, or a Weight + Similarity. Also, lots of scorers don't use the Similarity at all, and its not really needed in Scorer itself. Additionally, the Weight argument is often null. The Weight makes sense to be here in Scorer, its the parent that created the scorer, and used by Scorer itself to support LUCENE-2590's features. But I dont think all queries work with this feature correctly right now, because they pass null. Finally the situation gets confusing if you start to consider delegators like ScoreCachingWrapperScorer, which arent really delegating correctly so I'm unsure features like LUCENE-2590 aren't working with this. So I think we should remove the getSimilarity, if your scorer uses a Similarity its already coming to you via your ctor from your Weight and you can manage this yourself. Also, all scorers should pass the Weight (parent) that created them, and this should be Scorer's only ctor. I fixed all core/contrib/solr Scorers (even the internal ones) to pass their parent Weight, just for consistency of this visitor interface. The only one that passes null is Solr's ValueSourceScorer. I set fix-for 3.1, not because i want to backport anything, only to mark the getSimilarity deprecated there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Adding Myself as Mentor to the Incubator Proposal
Hi all, I'm in the process of signing up as mentor on the Incubator wiki and thought I'd better introduce myself since I don't expect anybody around here to know me. Like many people I came to the ASF to scratch a few itches I encountered with an existing project I used at work. In my case it was JServ in around 1998 but I mainly remained a pure user with the occasional bug report back then. But I gradually became more involved over time. In 2000 I was voted in as a committer to Ant and later the year as a member of the ASF. Today I still am an active committer to Ant, the PMC chairman of Gump and involved in a few smaller parts of Commons and the remainings of Jakarta. A few years ago I mentored Apache Ivy through incubation so I already wear my Incubator scars. During work hours the .NET platform has become my main development target since 2005. Even though all the ASF projects I'm involved in are Java projects I'm very familiar with C# and the platform in general. Early last year I coded up a prototype for a customer project (that never took off) using Lucene.NET and recall how wrong it felt so I fully understand and appreciate the need for an idiomatic API. It's my goal to keep out of any technical decisions, that's really up to the committers to decide. I may find time to participate in the discussions and even provide a patch or two but won't promise anything. I hope I can contribute a small part to a successful reboot of the project. Let's enjoy the ride. Stefan -- http://stefan.samaflost.de/
[jira] Updated: (LUCENE-2876) Remove Scorer.getSimilarity()
[ https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2876: Attachment: LUCENE-2876.patch Updated patch: with javadocs inconsistencies corrected. Remove Scorer.getSimilarity() - Key: LUCENE-2876 URL: https://issues.apache.org/jira/browse/LUCENE-2876 Project: Lucene - Java Issue Type: Task Components: Query/Scoring Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2876.patch, LUCENE-2876.patch Originally this was part of the patch for per-field Similarity (LUCENE-2236), but I pulled it out here as its own issue as its really mostly unrelated. I also like it as a separate issue to apply the deprecation to branch_3x to just make less surprises/migration hassles for 4.0 users. Currently Scorer takes a confusing number of ctors, either a Similarity, or a Weight + Similarity. Also, lots of scorers don't use the Similarity at all, and its not really needed in Scorer itself. Additionally, the Weight argument is often null. The Weight makes sense to be here in Scorer, its the parent that created the scorer, and used by Scorer itself to support LUCENE-2590's features. But I dont think all queries work with this feature correctly right now, because they pass null. Finally the situation gets confusing if you start to consider delegators like ScoreCachingWrapperScorer, which arent really delegating correctly so I'm unsure features like LUCENE-2590 aren't working with this. So I think we should remove the getSimilarity, if your scorer uses a Similarity its already coming to you via your ctor from your Weight and you can manage this yourself. Also, all scorers should pass the Weight (parent) that created them, and this should be Scorer's only ctor. I fixed all core/contrib/solr Scorers (even the internal ones) to pass their parent Weight, just for consistency of this visitor interface. The only one that passes null is Solr's ValueSourceScorer. I set fix-for 3.1, not because i want to backport anything, only to mark the getSimilarity deprecated there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Adding Myself as Mentor to the Incubator Proposal
Welcome =) - Michael On Thu, Jan 20, 2011 at 8:50 AM, Stefan Bodewig bode...@apache.org wrote: Hi all, I'm in the process of signing up as mentor on the Incubator wiki and thought I'd better introduce myself since I don't expect anybody around here to know me. Like many people I came to the ASF to scratch a few itches I encountered with an existing project I used at work. In my case it was JServ in around 1998 but I mainly remained a pure user with the occasional bug report back then. But I gradually became more involved over time. In 2000 I was voted in as a committer to Ant and later the year as a member of the ASF. Today I still am an active committer to Ant, the PMC chairman of Gump and involved in a few smaller parts of Commons and the remainings of Jakarta. A few years ago I mentored Apache Ivy through incubation so I already wear my Incubator scars. During work hours the .NET platform has become my main development target since 2005. Even though all the ASF projects I'm involved in are Java projects I'm very familiar with C# and the platform in general. Early last year I coded up a prototype for a customer project (that never took off) using Lucene.NET and recall how wrong it felt so I fully understand and appreciate the need for an idiomatic API. It's my goal to keep out of any technical decisions, that's really up to the committers to decide. I may find time to participate in the discussions and even provide a patch or two but won't promise anything. I hope I can contribute a small part to a successful reboot of the project. Let's enjoy the ride. Stefan -- http://stefan.samaflost.de/
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984206#action_12984206 ] Shay Banon commented on LUCENE-2871: bq. Looking at the current patch, the class seems wrong. In my opinion, this should be only in NIOFSDirectory. SimpleFSDir should only use RAF. Its a good question, not sure what to do with it. Here is the problem. The channel output can be used with all 3 FS dirs (simple, nio, and mmap), and actually might make sense to be used even with SimpleFS (i.e. using non nio to read, but use file channel to write). In order to have all of them supported, currently, the simplest way is to put it in the base class so code will be shared. On IRC, a discussion was made to externalize the outputs and inputs, and then one can more easily pick and choose, but I think this will belong on a different patch. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch, LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984222#action_12984222 ] Earwin Burrfoot commented on LUCENE-2871: - Before arguing where to put this new IndexOutput, I think it's wise to have a benchmark proving we need it at all. I have serious doubts FileChannel's going to outperform RAF.write(). Why should it? And for the purporses of benchmark it can be anywhere. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch, LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984263#action_12984263 ] Shay Banon commented on LUCENE-2871: Agreed Earwin, lets first see if it make sense, this is just an experiment and might not make sense for single threaded writes. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch, LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2325) faceting throws NPE if all q's and fq's excluded
faceting throws NPE if all q's and fq's excluded Key: SOLR-2325 URL: https://issues.apache.org/jira/browse/SOLR-2325 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Example of a faceting request that produces a NPE http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (SOLR-2325) faceting throws NPE if all q's and fq's excluded
[ https://issues.apache.org/jira/browse/SOLR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-2325: -- Assignee: Yonik Seeley faceting throws NPE if all q's and fq's excluded Key: SOLR-2325 URL: https://issues.apache.org/jira/browse/SOLR-2325 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Assignee: Yonik Seeley Example of a faceting request that produces a NPE http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2474: --- Fix Version/s: 4.0 3.1 Assignee: Michael McCandless Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2574.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984280#action_12984280 ] Michael McCandless commented on LUCENE-2474: bq. Actually, I am against the last patch you posted, as it clearly has nothing to do with this issue Woops! Heh. bq. A MultiReader is just a wrapper - you don't reopen it, so it could just start off with an empty listener list, the subs could all retain their listener lists and an addListener() could just delegate to the contained readers. Well, it does have a reopen (reopens the subs wraps in a new MR), but I guess delegation would work for MR. And, same for ParallelReader. And I think the NRT case should work fine, since we don't expose IW.getReader anymore (hmm -- this was never backported to 3.x?) -- if you new IndexReader(IW), it creates a single collection holding all listeners, and then shares it w/ all SRs. Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey) Key: LUCENE-2474 URL: https://issues.apache.org/jira/browse/LUCENE-2474 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Shay Banon Fix For: 3.1, 4.0 Attachments: LUCENE-2474.patch, LUCENE-2474.patch, LUCENE-2574.patch Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey). A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the outside, especially when using NRT - reader attack of the clones). The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics
[ https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2691: --- Fix Version/s: 3.1 We should backport this to 3.1? Consolidate Near Real Time and Reopen API semantics --- Key: LUCENE-2691 URL: https://issues.apache.org/jira/browse/LUCENE-2691 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2691.patch, LUCENE-2691.patch We should consolidate the IndexWriter.getReader and the IndexReader.reopen semantics, since most people are already using the IR.reopen() method, we should simply add:: {code} IR.reopen(IndexWriter) {code} Initially, it could just call the IW.getReader(), but it probably should switch to just using package private methods for sharing the internals -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984285#action_12984285 ] Michael McCandless commented on LUCENE-2324: OK I think Michael's example can be solved, with a small change to the delete buffering. When a delete arrives, we should buffer in each DWPT, but also buffer into the global deletes pool (held in DocumentsWriter). Whenever any DWPT is flushed, that global pool is pushed. Then, the buffered deletes against each DWPT are carried (as usual) along w/ the segment that's flushed from that DWPT, but those buffered deletes *only* apply to the docs in that one segment. The pushed deletes from the global pool apply to all prior segments (ie, they coalesce). This way, the deletes that will be applied to the already flushed segments are aggressively pushed. Separately, I think we should relax the error semantics for updateDocument: if an aborting exception occurs (eg disk full while flushing a segment), then it's possible that the delete from an updateDocument will have applied but the add did not. Outside of error cases, of course, updateDocument will continue to be atomic (ie a commit() can never split the delete add). Then the updateDocument case is handled as just an [atomic wrt flush] add delete. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984291#action_12984291 ] Jason Rutherglen commented on LUCENE-2324: -- bq. When a delete arrives, we should buffer in each DWPT, but also buffer into the global deletes pool (held in DocumentsWriter). This'll work, however it seems like it's going to be a temporary solution if we implement sequence-ids properly and/or implement non-sequential merges. In fact, with shared doc-store gone, what's holding up non-sequential merging? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2872) Terms dict should block-encode terms
[ https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2872: --- Attachment: LUCENE-2872.patch New patch, specializes read* in ByteArrayDataInput (poached from LUCENE-2824). Terms dict should block-encode terms Key: LUCENE-2872 URL: https://issues.apache.org/jira/browse/LUCENE-2872 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2872.patch, LUCENE-2872.patch, LUCENE-2872.patch With PrefixCodedTermsReader/Writer we now encode each term standalone, ie its bytes, metadata, details for postings (frq/prox file pointers), etc. But, this is costly when something wants to visit many terms but pull metadata for only few (eg respelling, certain MTQs). This is particularly costly for sep codec because it has more metadata to store, per term. So instead I think we should block-encode all terms between indexed term, so that the metadata is stored column stride instead. This makes it faster to enum just terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2325) faceting throws NPE if all q's and fq's excluded
[ https://issues.apache.org/jira/browse/SOLR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2325: --- Attachment: SOLR-2325.patch Here's a patch that fixes things up. Reviewing the documentation, it looks like this wasn't actually a bug - in the past, only filters could be excluded. Still, it makes sense to be able to exclude the main query too, and this is what this patch implements. faceting throws NPE if all q's and fq's excluded Key: SOLR-2325 URL: https://issues.apache.org/jira/browse/SOLR-2325 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Assignee: Yonik Seeley Attachments: SOLR-2325.patch Example of a faceting request that produces a NPE http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics
[ https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2691: -- Assignee: Michael McCandless (was: Grant Ingersoll) Consolidate Near Real Time and Reopen API semantics --- Key: LUCENE-2691 URL: https://issues.apache.org/jira/browse/LUCENE-2691 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2691.patch, LUCENE-2691.patch We should consolidate the IndexWriter.getReader and the IndexReader.reopen semantics, since most people are already using the IR.reopen() method, we should simply add:: {code} IR.reopen(IndexWriter) {code} Initially, it could just call the IW.getReader(), but it probably should switch to just using package private methods for sharing the internals -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics
[ https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984302#action_12984302 ] Michael McCandless commented on LUCENE-2691: I'll backport... Consolidate Near Real Time and Reopen API semantics --- Key: LUCENE-2691 URL: https://issues.apache.org/jira/browse/LUCENE-2691 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2691.patch, LUCENE-2691.patch We should consolidate the IndexWriter.getReader and the IndexReader.reopen semantics, since most people are already using the IR.reopen() method, we should simply add:: {code} IR.reopen(IndexWriter) {code} Initially, it could just call the IW.getReader(), but it probably should switch to just using package private methods for sharing the internals -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2872) Terms dict should block-encode terms
[ https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984306#action_12984306 ] Robert Muir commented on LUCENE-2872: - +1 to commit, the last specialization made all the difference on my benchmarks. I think this will pave the way for us to fix Sep codec in the branch... Terms dict should block-encode terms Key: LUCENE-2872 URL: https://issues.apache.org/jira/browse/LUCENE-2872 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2872.patch, LUCENE-2872.patch, LUCENE-2872.patch With PrefixCodedTermsReader/Writer we now encode each term standalone, ie its bytes, metadata, details for postings (frq/prox file pointers), etc. But, this is costly when something wants to visit many terms but pull metadata for only few (eg respelling, certain MTQs). This is particularly costly for sep codec because it has more metadata to store, per term. So instead I think we should block-encode all terms between indexed term, so that the metadata is stored column stride instead. This makes it faster to enum just terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2325) Allow tagging and exclusion of main query for faceting
[ https://issues.apache.org/jira/browse/SOLR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2325: --- Description: Example of a faceting request that produces a NPE because tagging/excluding main query is not supported. http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity was: Example of a faceting request that produces a NPE http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity Priority: Minor (was: Major) Fix Version/s: 4.0 3.1 Issue Type: Improvement (was: Bug) Summary: Allow tagging and exclusion of main query for faceting (was: faceting throws NPE if all q's and fq's excluded) Allow tagging and exclusion of main query for faceting -- Key: SOLR-2325 URL: https://issues.apache.org/jira/browse/SOLR-2325 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Assignee: Yonik Seeley Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2325.patch Example of a faceting request that produces a NPE because tagging/excluding main query is not supported. http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2824) optimizations for bufferedindexinput
[ https://issues.apache.org/jira/browse/LUCENE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984326#action_12984326 ] Michael McCandless commented on LUCENE-2824: I'm seeing excellent gains w/ this patch, on Linux 64bit Java 6 NIOFSDir: ||Query||QPS clean||QPS robspec||Pct diff |spanFirst(unit, 5)|16.67|15.62|{color:red}-6.3%{color}| |unit state|8.04|7.87|{color:red}-2.2%{color}| |spanNear([unit, state], 10, true)|4.31|4.25|{color:red}-1.2%{color}| |unit state~3|4.85|5.02|{color:green}3.6%{color}| |unit state|10.35|10.94|{color:green}5.7%{color}| |unit~1.0|9.60|10.15|{color:green}5.7%{color}| |unit~2.0|9.35|9.94|{color:green}6.3%{color}| |united~2.0|3.30|3.51|{color:green}6.4%{color}| |+nebraska +state|161.71|174.23|{color:green}7.7%{color}| |+unit +state|11.20|12.09|{color:green}8.0%{color}| |doctitle:.*[Uu]nited.*|3.93|4.25|{color:green}8.0%{color}| |united~1.0|15.12|16.39|{color:green}8.4%{color}| |un*d|49.33|56.09|{color:green}13.7%{color}| |u*d|14.85|16.97|{color:green}14.3%{color}| |state|25.95|30.12|{color:green}16.1%{color}| |unit*|22.72|26.88|{color:green}18.3%{color}| |uni*|12.64|15.20|{color:green}20.2%{color}| |doctimesecnum:[1 TO 6]|8.42|10.73|{color:green}27.4%{color}| +1 to commit. optimizations for bufferedindexinput Key: LUCENE-2824 URL: https://issues.apache.org/jira/browse/LUCENE-2824 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1, 4.0 Reporter: Robert Muir Attachments: LUCENE-2824.patch along the same lines as LUCENE-2816: * the readVInt/readVLong/readShort/readInt/readLong are not optimal here since they defer to readByte. for example this means checking the buffer's bounds per-byte in readVint instead of per-vint. * its an easy win to speed this up, even for the vint case: its essentially always faster, the only slower case is 1024 single-byte vints in a row, in this case we would do a single extra bounds check (1025 instead of 1024) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2824) optimizations for bufferedindexinput
[ https://issues.apache.org/jira/browse/LUCENE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2824: Fix Version/s: 4.0 3.1 Assignee: Robert Muir optimizations for bufferedindexinput Key: LUCENE-2824 URL: https://issues.apache.org/jira/browse/LUCENE-2824 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1, 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2824.patch along the same lines as LUCENE-2816: * the readVInt/readVLong/readShort/readInt/readLong are not optimal here since they defer to readByte. for example this means checking the buffer's bounds per-byte in readVint instead of per-vint. * its an easy win to speed this up, even for the vint case: its essentially always faster, the only slower case is 1024 single-byte vints in a row, in this case we would do a single extra bounds check (1025 instead of 1024) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2872) Terms dict should block-encode terms
[ https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2872. Resolution: Fixed Terms dict should block-encode terms Key: LUCENE-2872 URL: https://issues.apache.org/jira/browse/LUCENE-2872 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2872.patch, LUCENE-2872.patch, LUCENE-2872.patch With PrefixCodedTermsReader/Writer we now encode each term standalone, ie its bytes, metadata, details for postings (frq/prox file pointers), etc. But, this is costly when something wants to visit many terms but pull metadata for only few (eg respelling, certain MTQs). This is particularly costly for sep codec because it has more metadata to store, per term. So instead I think we should block-encode all terms between indexed term, so that the metadata is stored column stride instead. This makes it faster to enum just terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
WARNING: re-index all trunk indices
If you are using Lucene's trunk (to be 4.0) builds, read on... I just committed LUCENE-2872, which is a hard break on the index file format. If you are living on Lucene's trunk then you have to remove any previously created indices and re-index, after updating. The change cuts over to a faster on-disk terms dictionary format, which block-encodes term data metadata between indexed terms. Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984331#action_12984331 ] Jason Rutherglen commented on LUCENE-2558: -- If we implement deletes via sequence id across all segments, then the .del file should probably remain the same (a set of bits)? Also, when we load up the BV on IW start, then I guess we'll need to init the array appropriately. Use sequence ids for deleted docs - Key: LUCENE-2558 URL: https://issues.apache.org/jira/browse/LUCENE-2558 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Priority: Minor Fix For: Realtime Branch Utilizing the sequence ids created via the update document methods, we will enable IndexReader deleted docs over a sequence id array. One of the decisions is what primitive type to use. We can start off with an int[], then possibly move to a short[] (for lower memory consumption) that wraps around. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics
[ https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2691. Resolution: Fixed Consolidate Near Real Time and Reopen API semantics --- Key: LUCENE-2691 URL: https://issues.apache.org/jira/browse/LUCENE-2691 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2691.patch, LUCENE-2691.patch We should consolidate the IndexWriter.getReader and the IndexReader.reopen semantics, since most people are already using the IR.reopen() method, we should simply add:: {code} IR.reopen(IndexWriter) {code} Initially, it could just call the IW.getReader(), but it probably should switch to just using package private methods for sharing the internals -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984333#action_12984333 ] Robert Muir commented on LUCENE-2657: - +1, patch looks good. Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657-branch_3x.patch, LUCENE-2657-branch_3x.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984343#action_12984343 ] Michael McCandless commented on LUCENE-2324: bq. In fact, with shared doc-store gone, what's holding up non-sequential merging? Nothing really! We could/should go do it right now... I think it should be trivial. Then, we should fixup our default MP to behave more like BSMP!! Immense segments are merged only pair wise, and no inadvertent optimizing... I think the buffered deletes will work fine for non-sequential merging -- we'd do the same coalescing we do now, only applying deletes on-demand to the to-be-merged segs, etc. We just have to make sure the merged segment is appended to the end of the index (well, what was the end as of when the merge kicked off); this way I think we can continue w/ the invariant that buffered deletes apply to all segments to their left? Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2877) BUG in the org.apache.lucene.analysis.br.BrazilianAnalyzer
BUG in the org.apache.lucene.analysis.br.BrazilianAnalyzer -- Key: LUCENE-2877 URL: https://issues.apache.org/jira/browse/LUCENE-2877 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.0.2 Environment: Windows 7 64bits, Eclipse Helios Reporter: Renan Pedro Terra de Oliveira Priority: Critical Fix For: 3.0.4 One weird bug with this field is that instead of false, you have to search for falsee to get the correct results. The same behavior happen with other fields that stored in the index and not analyzed. Example of create fields to indexing: Field field = new Field(situacaoDocumento, ATIVO, Field.Store.YES, Field.Index.NOT_ANALYZED); or Field field = new Field(copia, false, Field.Store.YES, Field.Index.NOT_ANALYZED); Example search i need to do, but nothing get correct result: IndexSearcher searcher = ...; TopScoreDocCollector collector = ; Query query = new TermQuery(new Term(copia, false)); searcher.search(query, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; if (hits.length 0) { return searcher.doc(0); } return null; Example search i do to work: IndexSearcher searcher = ...; TopScoreDocCollector collector = ; Query query = new TermQuery(new Term(copia, falsee)); searcher.search(query, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; if (hits.length 0) { return searcher.doc(0); } return null; I tested on the Luke (Lucene Index Toolbox) and he prove the bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984350#action_12984350 ] Michael McCandless commented on LUCENE-2558: We could also [someday] move deletes to a stacked model... where we only write deltas (newly deleted docs in the current session) against the segment, and on open we coalesce these. Merging would also periodically coalesce and write a new full vector... Use sequence ids for deleted docs - Key: LUCENE-2558 URL: https://issues.apache.org/jira/browse/LUCENE-2558 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Priority: Minor Fix For: Realtime Branch Utilizing the sequence ids created via the update document methods, we will enable IndexReader deleted docs over a sequence id array. One of the decisions is what primitive type to use. We can start off with an int[], then possibly move to a short[] (for lower memory consumption) that wraps around. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2877) BUG in the org.apache.lucene.analysis.br.BrazilianAnalyzer
[ https://issues.apache.org/jira/browse/LUCENE-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984353#action_12984353 ] Robert Muir commented on LUCENE-2877: - Hello, I think the issue is that you are using Field.Index.NOT_ANALYZED. This means the BrazilianAnalyzer is not actually analyzing your text at index-time, causing the confusion. BUG in the org.apache.lucene.analysis.br.BrazilianAnalyzer -- Key: LUCENE-2877 URL: https://issues.apache.org/jira/browse/LUCENE-2877 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.0.2 Environment: Windows 7 64bits, Eclipse Helios Reporter: Renan Pedro Terra de Oliveira Priority: Critical Fix For: 3.0.4 One weird bug with this field is that instead of false, you have to search for falsee to get the correct results. The same behavior happen with other fields that stored in the index and not analyzed. Example of create fields to indexing: Field field = new Field(situacaoDocumento, ATIVO, Field.Store.YES, Field.Index.NOT_ANALYZED); or Field field = new Field(copia, false, Field.Store.YES, Field.Index.NOT_ANALYZED); Example search i need to do, but nothing get correct result: IndexSearcher searcher = ...; TopScoreDocCollector collector = ; Query query = new TermQuery(new Term(copia, false)); searcher.search(query, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; if (hits.length 0) { return searcher.doc(0); } return null; Example search i do to work: IndexSearcher searcher = ...; TopScoreDocCollector collector = ; Query query = new TermQuery(new Term(copia, falsee)); searcher.search(query, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; if (hits.length 0) { return searcher.doc(0); } return null; I tested on the Luke (Lucene Index Toolbox) and he prove the bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2325) Allow tagging and exclusion of main query for faceting
[ https://issues.apache.org/jira/browse/SOLR-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2325. Resolution: Fixed Allow tagging and exclusion of main query for faceting -- Key: SOLR-2325 URL: https://issues.apache.org/jira/browse/SOLR-2325 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Assignee: Yonik Seeley Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2325.patch Example of a faceting request that produces a NPE because tagging/excluding main query is not supported. http://localhost:8983/solr/select?q={!tag=zzz}foofacet=truefacet.field={!ex=zzz}popularity -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Odd Boolean scoring behavior?
On Thu, Jan 20, 2011 at 2:17 PM, karl.wri...@nokia.com wrote: The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any effect. I can change it all over the place, and nothing much changes. Then perhaps your language term doesn't actually match anything in the index? (i.e. how is it analyzed?) Next step would be to get score explanations (just add debugQuery=true if you're using Solr, or see IndexSearcher.explain() if not). -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Odd Boolean scoring behavior?
I tried commenting out the final OR term, and that excluded all records that were out-of-language as expected. It's just the boost that doesn't seem to work. Exploring the explain is challenging because of its size, but there are NO boosts recorded of the size I am using (10.0). Here's the basic structure of the first result. 0.0 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) weight(language:eng in 52867945), product of: 0.0 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.0 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm ... 0.0069078947 = coord(21/3040) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm ... 0.0069078947 = coord(21/3040) It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, does not actually apply boost? Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Thursday, January 20, 2011 2:36 PM To: dev@lucene.apache.org Subject: Re: Odd Boolean scoring behavior? On Thu, Jan 20, 2011 at 2:17 PM, karl.wri...@nokia.com wrote: The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any effect. I can change it all over the place, and nothing much changes. Then perhaps your language term doesn't actually match anything in the index? (i.e. how is it analyzed?) Next step would be to get score explanations (just add debugQuery=true if you're using Solr, or see IndexSearcher.explain() if not). -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984373#action_12984373 ] Michael McCandless commented on LUCENE-2871: OK -- I was able to index 10M docs w/ the new patch. And search results are identical. But the indexing time on trunk vs the patch were nearly identical -- 536.80 sec (trunk) and 536.06 (w/ patch). But, this is on a fast machine, lots of RAM (so writes go straight to buffer cache) and an SSD, using 6 indexing threads. Use FileChannel in FSDirectory -- Key: LUCENE-2871 URL: https://issues.apache.org/jira/browse/LUCENE-2871 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Shay Banon Attachments: LUCENE-2871.patch, LUCENE-2871.patch Explore using FileChannel in FSDirectory to see if it improves write operations performance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2856) Create IndexWriter event listener, specifically for merges
[ https://issues.apache.org/jira/browse/LUCENE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2856: - Attachment: LUCENE-2856.patch Here's another iteration. Changed the name to IndexEventListener. Added experimental to the Javadocs, and I probably need to add more. There are some nocommits still, eg, for the reason a flush kicked off. Reader events should be in a different issue as reader pool is moving out of IW soon? All tests pass. Create IndexWriter event listener, specifically for merges -- Key: LUCENE-2856 URL: https://issues.apache.org/jira/browse/LUCENE-2856 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Jason Rutherglen Attachments: LUCENE-2856.patch, LUCENE-2856.patch, LUCENE-2856.patch, LUCENE-2856.patch The issue will allow users to monitor merges occurring within IndexWriter using a callback notifier event listener. This can be used by external applications such as Solr to monitor large segment merges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984379#action_12984379 ] Jason Rutherglen commented on LUCENE-2324: -- {quote}We could/should go do it right now{quote} Nice! {quote}I think the buffered deletes will work fine for non-sequential merging - we'd do the same coalescing we do now, only applying deletes on-demand to the to-be-merged segs, etc.{quote} I think this is going to make IW deletes even more hairy and hard to understand! Though if we keep the option of using a BV for deletes then there's probably no choice. If we implemented sequence-id deletes using a short[], then we're only increasing the RAM usage by 16 times, though we then do not need to clone which can generate excessive garbage (in a high flush [N]RT enviro). Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984381#action_12984381 ] Ahsan Iqbal commented on SOLR-1604: --- Hi Ahmet I have read on mailing lists dated Mon, 20 Jul 2009 that you had merged the surround query parser with solr. I have tried that by downloading jar file for surround query parser. and then pasting that jar file in web-inf/lib and configured query parser in solrconfig.xml as queryParser name=SurroundQParser class=org.apache.lucene.queryParser.surround.parser.QueryParser/ then web page the following exception comes org.apache.solr.common.SolrException: Error Instantiating QParserPlugin, org.apache.lucene.queryParser.surround.parser.QueryParser is not a org.apache.solr.search.QParserPlugin can u guide what i m doing wrong Regards Ahsan Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2720) IndexWriter should throw IndexFormatTooOldExc on open, not later during optimize/getReader/close
[ https://issues.apache.org/jira/browse/LUCENE-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2720: --- Attachment: LUCENE-2720-trunk.patch Patch against trunk. I need to fix 3x to write the version and produce an index for TestBackCompat before committing this patch (even though the tests pass). IndexWriter should throw IndexFormatTooOldExc on open, not later during optimize/getReader/close Key: LUCENE-2720 URL: https://issues.apache.org/jira/browse/LUCENE-2720 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2720-trunk.patch Spinoff of LUCENE-2618 and also related to the original issue LUCENE-2523... If you open IW on a too-old index, you don't find out until much later that the index is too old. This is because IW does not go and open segment readers on all segments. It only does so when it's time to apply deletes, do merges, open an NRT reader, etc. This is a serious bug because you can in fact succeed in committing with the new major version of Lucene against your too-old index, which is catastrophic because suddenly the old Lucene version will no longer open the index, and so your index becomes unusable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Odd Boolean scoring behavior?
On Thu, Jan 20, 2011 at 3:06 PM, karl.wri...@nokia.com wrote: I tried commenting out the final OR term, and that excluded all records that were out-of-language as expected. It's just the boost that doesn't seem to work. I see a lot of unexpected zeros - queryNorm has factors if idf and the boost in it - the fact that it's 0 suggests that you used a 0 boost. Why don't you do a toString() on your query and see if it's what you expect. -Yonik http://www.lucidimagination.com Exploring the explain is challenging because of its size, but there are NO boosts recorded of the size I am using (10.0). Here's the basic structure of the first result. 0.0 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) weight(language:eng in 52867945), product of: 0.0 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.0 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm ... 0.0069078947 = coord(21/3040) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm ... 0.0069078947 = coord(21/3040) It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, does not actually apply boost? Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Thursday, January 20, 2011 2:36 PM To: dev@lucene.apache.org Subject: Re: Odd Boolean scoring behavior? On Thu, Jan 20, 2011 at 2:17 PM, karl.wri...@nokia.com wrote: The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any effect. I can change it all over the place, and nothing much changes. Then perhaps your language term doesn't actually match anything in the index? (i.e. how is it analyzed?) Next step would be to get score explanations (just add debugQuery=true if you're using Solr, or see IndexSearcher.explain() if not). -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2876) Remove Scorer.getSimilarity()
[ https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984387#action_12984387 ] Robert Muir commented on LUCENE-2876: - Committed revision 1061499. Will work on adding the @deprecated and fixing javadocs and null Weights in branch_3x, but we need to provide Similarity where we were providing it before for backwards compatibility. Remove Scorer.getSimilarity() - Key: LUCENE-2876 URL: https://issues.apache.org/jira/browse/LUCENE-2876 Project: Lucene - Java Issue Type: Task Components: Query/Scoring Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2876.patch, LUCENE-2876.patch Originally this was part of the patch for per-field Similarity (LUCENE-2236), but I pulled it out here as its own issue as its really mostly unrelated. I also like it as a separate issue to apply the deprecation to branch_3x to just make less surprises/migration hassles for 4.0 users. Currently Scorer takes a confusing number of ctors, either a Similarity, or a Weight + Similarity. Also, lots of scorers don't use the Similarity at all, and its not really needed in Scorer itself. Additionally, the Weight argument is often null. The Weight makes sense to be here in Scorer, its the parent that created the scorer, and used by Scorer itself to support LUCENE-2590's features. But I dont think all queries work with this feature correctly right now, because they pass null. Finally the situation gets confusing if you start to consider delegators like ScoreCachingWrapperScorer, which arent really delegating correctly so I'm unsure features like LUCENE-2590 aren't working with this. So I think we should remove the getSimilarity, if your scorer uses a Similarity its already coming to you via your ctor from your Weight and you can manage this yourself. Also, all scorers should pass the Weight (parent) that created them, and this should be Scorer's only ctor. I fixed all core/contrib/solr Scorers (even the internal ones) to pass their parent Weight, just for consistency of this visitor interface. The only one that passes null is Solr's ValueSourceScorer. I set fix-for 3.1, not because i want to backport anything, only to mark the getSimilarity deprecated there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984389#action_12984389 ] Jason Rutherglen commented on LUCENE-2558: -- In regards to the deltas, when they're in RAM (ie, for norm and DF updates), I'm guessing we'd need to place the updates into a hash map (that hopefully uses primitives instead of objects to save RAM)? We could instantiate a new array when the map reached a certain size? Use sequence ids for deleted docs - Key: LUCENE-2558 URL: https://issues.apache.org/jira/browse/LUCENE-2558 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Priority: Minor Fix For: Realtime Branch Utilizing the sequence ids created via the update document methods, we will enable IndexReader deleted docs over a sequence id array. One of the decisions is what primitive type to use. We can start off with an int[], then possibly move to a short[] (for lower memory consumption) that wraps around. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2482) Index sorter
[ https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juan Grande updated LUCENE-2482: Attachment: LUCENE-2482-4.0.patch Hi! I'm attaching a patch with an implementation of this feature for Lucene 4.0. I'm not sure if the style is right because I can't download the codestyle.xml file for Eclipse. Index sorter Key: LUCENE-2482 URL: https://issues.apache.org/jira/browse/LUCENE-2482 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 3.1, 4.0 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 3.1, 4.0 Attachments: indexSorter.patch, LUCENE-2482-4.0.patch A tool to sort index according to a float document weight. Documents with high weight are given low document numbers, which means that they will be first evaluated. When using a strategy of early termination of queries (see TimeLimitedCollector) such sorting significantly improves the quality of partial results. (Originally this tool was created by Doug Cutting in Nutch, and used norms as document weights - thus the ordering was limited by the limited resolution of norms. This is a pure Lucene version of the tool, and it uses arbitrary floats from a specified stored field). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2326) Replication command indexversion fails to return index version
Replication command indexversion fails to return index version -- Key: SOLR-2326 URL: https://issues.apache.org/jira/browse/SOLR-2326 Project: Solr Issue Type: Bug Components: replication (java) Environment: Branch 3x latest Reporter: Eric Pugh Fix For: 3.1 To test this, I took the /example/multicore/core0 solrconfig and added a simple replication handler: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesschema.xml/str /lst /requestHandler When I query the handler for details I get back the indexVersion that I expect: http://localhost:8983/solr/core0/replication?command=detailswt=jsonindent=true But when I ask for just the indexVersion I get back a 0, which prevent the slaves from pulling updates: http://localhost:8983/solr/core0/replication?command=indexversionwt=jsonindent=true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Odd Boolean scoring behavior?
The original query is fine, and has the boost as expected: ((+language:eng +( CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_0:bunker~0.8332333 +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_1:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +othervalue_0:hill)^0.5714286) ... CutoffQueryWrapper((+othervalue_7:bunker~0.8332333 +value_7:hillmonument~0.8332333)^0.85714287) CutoffQueryWrapper((+value_7:bunker~0.8332333 +othervalue_7:hillmonument~0.8332333)^0.85714287)))^3.0) ( CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_0:bunker~0.8332333 +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.667) ... )) The rewritten query is odd. Here's a sample: ((+language:eng +( CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) ... CutoffQueryWrapper((+() +(()^0.556))^0.85714287)))^3.0) ( CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+() +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) +value_0:hill)^0.667) CutoffQueryWrapper((+() +value_0:hill)^0.5714286) ... CutoffQueryWrapper((+() +(()^0.556))^0.85714287) CutoffQueryWrapper(+() +(()^0.667)) CutoffQueryWrapper((+() +(()^0.667))^0.85714287) CutoffQueryWrapper((+() +(()^0.556))^0.85714287) ) As you can see, there are a lot of repeats, a lot of blank matches, but the original boost *is* still there. I really can't interpret this any further - the many blank and repeated matches seem wrong to me, but the scorer explanation seems even more wrong. Any ideas? Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Thursday, January 20, 2011 3:34 PM To: dev@lucene.apache.org Subject: Re: Odd Boolean scoring behavior? On Thu, Jan 20, 2011 at 3:06 PM, karl.wri...@nokia.com wrote: I tried commenting out the final OR term, and that excluded all records that were out-of-language as expected. It's just the boost that doesn't seem to work. I see a lot of unexpected zeros - queryNorm has factors if idf and the boost in it - the fact that it's 0 suggests that you used a 0 boost. Why don't you do a toString() on your query and see if it's what you expect. -Yonik http://www.lucidimagination.com Exploring the explain is challenging because of its size, but there are NO boosts recorded of the size I am using (10.0). Here's the basic structure of the first result. 0.0 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) weight(language:eng in 52867945), product of: 0.0 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.0 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0 = (MATCH) product of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.0 = queryNorm 0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4
Lucene-3.x - Build # 249 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/249/ All tests passed Build Log (for compile errors): [...truncated 21087 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Odd Boolean scoring behavior?
Found the cause of the zero querynorms, and fixed it. But the results are still not as I would expect. The first result has language=ger but scores higher than the second result which has language=eng. And yet, my query is boosting like this: Boolean OR Boolean (boost = 100.0) AND (language:eng) AND (stuff) OR (stuff) ... where (stuff) is the same stuff in both cases. Here's the scoring for two results, the first one out of language, and the second one in language: 0.018082526 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@6cdcb5eb sum of: 0.018059647 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@e2b8f23 sum of: 0.015771711 = (MATCH) weight(language:eng in 52867945), product of: 0.015771711 = queryWeight(language:eng), product of: 1.0 = idf(docFreq=23889670, maxDocs=59327671) 0.015771711 = queryNorm 1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of: 1.0 = tf(termFreq(language:eng)=0) 1.0 = idf(docFreq=23889670, maxDocs=59327671) 1.0 = fieldNorm(field=language, doc=52867945) 0.0022879362 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 product of: 0.331206 = (MATCH) org.apache.lucene.search.BooleanQuery$BooleanWeight@4dc24a19 sum of: 0.015771711 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-4))^0.667), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 othervalue_5:bunker othervalue_5:bunner^5.997396E-4 othervalue_5:burker^5.997396E-4) +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-4))^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 value_5:busker^5.997396E-4) +(othervalue_5:monument othervalue_5:monumento^7.9949305E-4 othervalue_5:monuments^7.9949305E-4))^0.5714286), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+value_5:hill +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4 value_5:monuments^7.9949305E-4))^0.667), product of: 1.0 = boost 0.015771711 = queryNorm 0.015771711 = (MATCH) CutoffQueryWrapper((+othervalue_5:hill +(value_5:monument value_5:monumenta^7.9949305E-4 value_5:monumentc^7.9949305E-4 value_5:monumento^7.9949305E-4
Lucene-trunk - Build # 1433 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1433/ All tests passed Build Log (for compile errors): [...truncated 16653 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe resolved LUCENE-2657. - Resolution: Fixed Committed to trunk rev. 1061613, branch_3x rev. 1061612 Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657-branch_3x.patch, LUCENE-2657-branch_3x.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-1218) maven artifact for webapp
[ https://issues.apache.org/jira/browse/SOLR-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe resolved SOLR-1218. --- Resolution: Fixed Fix Version/s: 4.0 3.1 Assignee: Steven Rowe Maven artifact for Solr webapp is now generated (fixed in LUCENE-2657). maven artifact for webapp - Key: SOLR-1218 URL: https://issues.apache.org/jira/browse/SOLR-1218 Project: Solr Issue Type: New Feature Affects Versions: 1.3 Reporter: Benson Margulies Assignee: Steven Rowe Fix For: 3.1, 4.0 It would be convenient to have a packagingwar/packaging maven project for the webapp, to allow launching solr from maven via jetty. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2876) Remove Scorer.getSimilarity()
[ https://issues.apache.org/jira/browse/LUCENE-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2876. - Resolution: Fixed backported to 3.x: revision 1061615. Remove Scorer.getSimilarity() - Key: LUCENE-2876 URL: https://issues.apache.org/jira/browse/LUCENE-2876 Project: Lucene - Java Issue Type: Task Components: Query/Scoring Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2876.patch, LUCENE-2876.patch Originally this was part of the patch for per-field Similarity (LUCENE-2236), but I pulled it out here as its own issue as its really mostly unrelated. I also like it as a separate issue to apply the deprecation to branch_3x to just make less surprises/migration hassles for 4.0 users. Currently Scorer takes a confusing number of ctors, either a Similarity, or a Weight + Similarity. Also, lots of scorers don't use the Similarity at all, and its not really needed in Scorer itself. Additionally, the Weight argument is often null. The Weight makes sense to be here in Scorer, its the parent that created the scorer, and used by Scorer itself to support LUCENE-2590's features. But I dont think all queries work with this feature correctly right now, because they pass null. Finally the situation gets confusing if you start to consider delegators like ScoreCachingWrapperScorer, which arent really delegating correctly so I'm unsure features like LUCENE-2590 aren't working with this. So I think we should remove the getSimilarity, if your scorer uses a Similarity its already coming to you via your ctor from your Weight and you can manage this yourself. Also, all scorers should pass the Weight (parent) that created them, and this should be Scorer's only ctor. I fixed all core/contrib/solr Scorers (even the internal ones) to pass their parent Weight, just for consistency of this visitor interface. The only one that passes null is Solr's ValueSourceScorer. I set fix-for 3.1, not because i want to backport anything, only to mark the getSimilarity deprecated there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2824) optimizations for bufferedindexinput
[ https://issues.apache.org/jira/browse/LUCENE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2824. - Resolution: Fixed Committed revision 1061619, 1061622 optimizations for bufferedindexinput Key: LUCENE-2824 URL: https://issues.apache.org/jira/browse/LUCENE-2824 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1, 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2824.patch along the same lines as LUCENE-2816: * the readVInt/readVLong/readShort/readInt/readLong are not optimal here since they defer to readByte. for example this means checking the buffer's bounds per-byte in readVint instead of per-vint. * its an easy win to speed this up, even for the vint case: its essentially always faster, the only slower case is 1024 single-byte vints in a row, in this case we would do a single extra bounds check (1025 instead of 1024) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2774) ant generate-maven-artifacts target broken for contrib
[ https://issues.apache.org/jira/browse/LUCENE-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984583#action_12984583 ] Steven Rowe commented on LUCENE-2774: - I tested {{ant clean generate-maven-artifacts}} on branch_3x with {{maven-ant-tasks-2.1.1.jar}} and both Ant 1.7.1 and 1.8.1. Everything works. I'll test more combinations tomorrow. ant generate-maven-artifacts target broken for contrib -- Key: LUCENE-2774 URL: https://issues.apache.org/jira/browse/LUCENE-2774 Project: Lucene - Java Issue Type: Bug Components: Build Affects Versions: 3.1, 4.0 Reporter: Drew Farris Assignee: Steven Rowe Priority: Minor Attachments: LUCENE-2774.patch When executing 'ant generate-maven-artifacts' from a pristine checkout of branch_3x/lucene or trunk/lucene the following error is encountered: {code} dist-maven: [copy] Copying 1 file to /home/drew/lucene/branch_3x/lucene/build/contrib/analyzers/common [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2:runtime [artifact:pom] An error has occurred while processing the Maven artifact tasks. [artifact:pom] Diagnosis: [artifact:pom] [artifact:pom] Unable to initialize POM pom.xml.template: Cannot find parent: org.apache.lucene:lucene-contrib for project: org.apache.lucene:lucene-analyzers:jar:3.1-SNAPSHOT for project org.apache.lucene:lucene-analyzers:jar:3.1-SNAPSHOT [artifact:pom] Unable to download the artifact from any repository {code} The contrib portion of the ant build is executed in a subant task which does not pick up the pom definitions for lucene-parent and lucene-contrib from the main build.xml, so the lucene-parent and lucene-controb poms must be loaded specifically as a part of the contrib build using the artifact:pom task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr-3.x - Build # 233 - Still Failing
Build: https://hudson.apache.org/hudson/job/Solr-3.x/233/ No tests ran. Build Log (for compile errors): [...truncated 15582 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 3954 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/3954/ 1 tests failed. REGRESSION: org.apache.solr.update.AutoCommitTest.testMaxTime Error Message: should not be there yet query failed XPath: //result[@numFound=0] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime682/int/lstresult name=response numFound=1 start=0docint name=id500/intint name=intDefault42/intarr name=multiDefaultstrmuLti-Default/str/arrarr name=range_facet_llong500/long/arrarr name=range_facet_siint500/int/arrarr name=range_facet_sllong500/long/arrdate name=timestamp2011-01-21T06:54:36.295Z/date/doc/result /response request was: start=0q=id:500qt=standardrows=20version=2.2 Stack Trace: junit.framework.AssertionFailedError: should not be there yet query failed XPath: //result[@numFound=0] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime682/int/lstresult name=response numFound=1 start=0docint name=id500/intint name=intDefault42/intarr name=multiDefaultstrmuLti-Default/str/arrarr name=range_facet_llong500/long/arrarr name=range_facet_siint500/int/arrarr name=range_facet_sllong500/long/arrdate name=timestamp2011-01-21T06:54:36.295Z/date/doc/result /response request was: start=0q=id:500qt=standardrows=20version=2.2 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1007) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:939) xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime682/int/lstresult name=response numFound=1 start=0docint name=id500/intint name=intDefault42/intarr name=multiDefaultstrmuLti-Default/str/arrarr name=range_facet_llong500/long/arrarr name=range_facet_siint500/int/arrarr name=range_facet_sllong500/long/arrdate name=timestamp2011-01-21T06:54:36.295Z/date/doc/result /response request was: start=0q=id:500qt=standardrows=20version=2.2 at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:246) at org.apache.solr.update.AutoCommitTest.testMaxTime(AutoCommitTest.java:206) Build Log (for compile errors): [...truncated 9814 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2872) Terms dict should block-encode terms
[ https://issues.apache.org/jira/browse/LUCENE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984589#action_12984589 ] Simon Willnauer commented on LUCENE-2872: - WOW nice mike! do you have benchmark numbers here by any chance? After all those improvements - FST, TermState, BlockCoded TermDict etc. I wonder if we reached the 10k% in the 3.0 vs. 4.0 united~2.0 benchmark... Terms dict should block-encode terms Key: LUCENE-2872 URL: https://issues.apache.org/jira/browse/LUCENE-2872 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2872.patch, LUCENE-2872.patch, LUCENE-2872.patch With PrefixCodedTermsReader/Writer we now encode each term standalone, ie its bytes, metadata, details for postings (frq/prox file pointers), etc. But, this is costly when something wants to visit many terms but pull metadata for only few (eg respelling, certain MTQs). This is particularly costly for sep codec because it has more metadata to store, per term. So instead I think we should block-encode all terms between indexed term, so that the metadata is stored column stride instead. This makes it faster to enum just terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2720) IndexWriter should throw IndexFormatTooOldExc on open, not later during optimize/getReader/close
[ https://issues.apache.org/jira/browse/LUCENE-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2720: --- Attachment: LUCENE-2720-3x.patch Patch against 3x: * Adds FieldsReader.detectCodeVersion - returns 2.x for pre-3.0 indexes and 3.0 for 3.0 indexes. Not called for 3.1+ segments. * SegmentInfo records its code version (Constants.LUCENE_MAIN_VERSION). * SegmentInfos bumps up the format number and upgrades old segments (2.x or 3.0) to record their version too. I'll update the trunk patch to reflect those changes (i.e., now indexes touched by 3.1+ code will have their segments recording their version, whether they are pre-3.0 or not). IndexWriter should throw IndexFormatTooOldExc on open, not later during optimize/getReader/close Key: LUCENE-2720 URL: https://issues.apache.org/jira/browse/LUCENE-2720 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2720-3x.patch, LUCENE-2720-trunk.patch Spinoff of LUCENE-2618 and also related to the original issue LUCENE-2523... If you open IW on a too-old index, you don't find out until much later that the index is too old. This is because IW does not go and open segment readers on all segments. It only does so when it's time to apply deletes, do merges, open an NRT reader, etc. This is a serious bug because you can in fact succeed in committing with the new major version of Lucene against your too-old index, which is catastrophic because suddenly the old Lucene version will no longer open the index, and so your index becomes unusable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org