Re: [Lucene.Net] style cop, fx cop rules
I don't think there's any harm in putting StyleCop in the project at this stage, but of course, no harm not putting it in either. It would be handy for people who already have VS2008/2010, as we could keep Lucene with the same style format across the project as a whole. IMO, I think the Naming, Maintainability, and layour rules are the most important. I use R#, so many of the default ones there are the ones I'm partial to. For example, I like my private fields to start with underscores. I like my private properties, method names, public fields to be in pascal case. I like local variables and method parameters to use camel case. I dislike hungarian notation. I like only one class per file, and one namespace per file, those being in the maintainability rules. I would like to hear other people's opinions on this, or maybe one of us should just make a rule set and have everyone else look over it. Thanks, Christopher On Wed, Jul 27, 2011 at 7:11 PM, Michael Herndon mhern...@wickedsoftware.net wrote: Does anyone have any preferred rules that they want ignored or want required for the project for either Fx Cop or Style Cop? It might be prudent to wait on putting style cop int the project, it currently doesn't have a command line client and if installed it would generate warnings on each time someone builds on their local. - Michael.
[jira] [Created] (LUCENE-3352) ParametricRangeQueryNodeProcessor support for time zones
ParametricRangeQueryNodeProcessor support for time zones Key: LUCENE-3352 URL: https://issues.apache.org/jira/browse/LUCENE-3352 Project: Lucene - Java Issue Type: New Feature Components: modules/queryparser Reporter: Trejkaz It would be nice if there were a config attribute for setting a time zone for dates in the query. At the moment I am using my own query node processor to implement this, but I stumbled upon ParametricRangeQueryNodeProcessor and it is very close to being usable as-is. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3353) ParametricRangeQueryNodeProcessor uses incorrect logic at the lower bound
ParametricRangeQueryNodeProcessor uses incorrect logic at the lower bound - Key: LUCENE-3353 URL: https://issues.apache.org/jira/browse/LUCENE-3353 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.3 Reporter: Trejkaz ParametricRangeQueryNodeProcessor currently works as follows: # If the operator was LE or GE, set inclusive = true. # Set up a calendar # If inclusive, set the second time to 23:59:59:999 # Convert that to a string using the DateResolution. The problem is, this breaks for *exclusive* queries. For instance, if the user types in {20100110 TO 20100120} they would expect to get the 10th to the 20th exclusive, i.e. the 11th to the 19th. But in reality, the 10th will be *inclusive*. To get an actually-exclusive range for the lower bound, the time should be set to 23:59:59:999, much the same as what is done for the inclusive upper bound. I suspect the original query parser has the same issue, though possibly in different words. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2337) Solr needs hits= added to the log when using grouping
[ https://issues.apache.org/jira/browse/SOLR-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073447#comment-13073447 ] Martijn van Groningen commented on SOLR-2337: - Yes the trunk and branch3x do log the request when using grouping. The number of matches of the first command is actually logged now. Solr needs hits= added to the log when using grouping -- Key: SOLR-2337 URL: https://issues.apache.org/jira/browse/SOLR-2337 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.0 Reporter: Bill Bell Fix For: 4.0 Attachments: SOLR.2337.patch We monitor the Solr logs to try to review queries that have hits=0. This enables us to improve relevancy since they are easy to find and review. When using group=true, hits= does not show up: {code} 2011-01-27 01:10:16,117 INFO core.SolrCore - [collection1] webapp= path=/select params={group=truegroup.field=gendergroup.field=idq=*:*} status=0 QTime=15 {code} The code in QueryComponent.java needs to add the matches() after calling grouping.execute() and add up the total. It does return hits= in the log for mainResult, but not for standard grouping. This should be easy to add since matches are defined... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-trunk - Build # 1586 - Failure
Build: https://builds.apache.org/job/Solr-trunk/1586/ 1 tests failed. REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxOnCoreReload Error Message: Number of registered MBeans is not the same as info registry size expected:51 but was:46 Stack Trace: junit.framework.AssertionFailedError: Number of registered MBeans is not the same as info registry size expected:51 but was:46 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1522) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1427) at org.apache.solr.core.TestJmxIntegration.testJmxOnCoreReload(TestJmxIntegration.java:158) Build Log (for compile errors): [...truncated 9329 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1879) Parallel incremental indexing
[ https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073462#comment-13073462 ] Eks Dev commented on LUCENE-1879: - The user mentioned above in comment was me, I guess. Commenting here just to add interesting use case that would be perfectly solved by this issue. Imagine solr Master - Slave setup, full document contains CONTENT and ID fields, e.g. 200Mio+ collection. On master, we need field ID indexed in order to process delete/update commands. On slave, we do not need lookup on ID and would like to keep our TermsDictionary small, without exploding TermsDictionary with 200Mio+ unique ID terms (ouch, this is a lot compared to 5Mio unique terms in CONTENT, with or without pulsing). With this issue, this could be nativly achieved by modifying solr UpdateHandler not to transfer ID-Index to slaves at all. There are other ways to fix it, but this would be the best.(I am currently investigating an option to transfer full index on update, but to filter-out TermsDictionary on IndexReader level (it remains on disk, but this part never gets accessed on slaves). I do not know yet if this is possible at all in general , e.g. FST based term dictionary is already built (prefix compressed TermDict would be doable) Parallel incremental indexing - Key: LUCENE-1879 URL: https://issues.apache.org/jira/browse/LUCENE-1879 Project: Lucene - Java Issue Type: New Feature Components: core/index Reporter: Michael Busch Assignee: Michael Busch Fix For: 4.0 Attachments: parallel_incremental_indexing.tar A new feature that allows building parallel indexes and keeping them in sync on a docID level, independent of the choice of the MergePolicy/MergeScheduler. Find details on the wiki page for this feature: http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing Discussion on java-dev: http://markmail.org/thread/ql3oxzkob7aqf3jd -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3353) ParametricRangeQueryNodeProcessor uses incorrect logic at the lower bound
[ https://issues.apache.org/jira/browse/LUCENE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073474#comment-13073474 ] Uwe Schindler commented on LUCENE-3353: --- This seems to be related to LUCENE-1768 and LUCENE-2979, as we need to change the config API, so the timezone would only be another param. Maybe that can be done with the work on those two issues. ParametricRangeQueryNodeProcessor uses incorrect logic at the lower bound - Key: LUCENE-3353 URL: https://issues.apache.org/jira/browse/LUCENE-3353 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.3 Reporter: Trejkaz ParametricRangeQueryNodeProcessor currently works as follows: # If the operator was LE or GE, set inclusive = true. # Set up a calendar # If inclusive, set the second time to 23:59:59:999 # Convert that to a string using the DateResolution. The problem is, this breaks for *exclusive* queries. For instance, if the user types in {20100110 TO 20100120} they would expect to get the 10th to the 20th exclusive, i.e. the 11th to the 19th. But in reality, the 10th will be *inclusive*. To get an actually-exclusive range for the lower bound, the time should be set to 23:59:59:999, much the same as what is done for the inclusive upper bound. I suspect the original query parser has the same issue, though possibly in different words. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2979) Simplify configuration API of contrib Query Parser
[ https://issues.apache.org/jira/browse/LUCENE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073475#comment-13073475 ] Uwe Schindler commented on LUCENE-2979: --- There was an issue opened today: LUCENE-3353 Maybe that is related to the config changes here, perhaps already fixed? Simplify configuration API of contrib Query Parser -- Key: LUCENE-2979 URL: https://issues.apache.org/jira/browse/LUCENE-2979 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 2.9, 3.0 Reporter: Adriano Crestani Assignee: Adriano Crestani Labels: api-change, gsoc, gsoc2011, lucene-gsoc-11, mentor Fix For: 3.4, 4.0 Attachments: LUCENE-2979_phillipe_ramalho_2.patch, LUCENE-2979_phillipe_ramalho_3.patch, LUCENE-2979_phillipe_ramalho_3.patch, LUCENE-2979_phillipe_reamalho.patch The current configuration API is very complicated and inherit the concept used by Attribute API to store token information in token streams. However, the requirements for both (QP config and token stream) are not the same, so they shouldn't be using the same thing. I propose to simplify QP config and make it less scary for people intending to use contrib QP. The task is not difficult, it will just require a lot of code change and figure out the best way to do it. That's why it's a good candidate for a GSoC project. I would like to hear good proposals about how to make the API more friendly and less scaring :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3353) ParametricRangeQueryNodeProcessor uses incorrect logic at the lower bound
[ https://issues.apache.org/jira/browse/LUCENE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3353: -- Comment: was deleted (was: This seems to be related to LUCENE-1768 and LUCENE-2979, as we need to change the config API, so the timezone would only be another param. Maybe that can be done with the work on those two issues.) ParametricRangeQueryNodeProcessor uses incorrect logic at the lower bound - Key: LUCENE-3353 URL: https://issues.apache.org/jira/browse/LUCENE-3353 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.3 Reporter: Trejkaz ParametricRangeQueryNodeProcessor currently works as follows: # If the operator was LE or GE, set inclusive = true. # Set up a calendar # If inclusive, set the second time to 23:59:59:999 # Convert that to a string using the DateResolution. The problem is, this breaks for *exclusive* queries. For instance, if the user types in {20100110 TO 20100120} they would expect to get the 10th to the 20th exclusive, i.e. the 11th to the 19th. But in reality, the 10th will be *inclusive*. To get an actually-exclusive range for the lower bound, the time should be set to 23:59:59:999, much the same as what is done for the inclusive upper bound. I suspect the original query parser has the same issue, though possibly in different words. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2979) Simplify configuration API of contrib Query Parser
[ https://issues.apache.org/jira/browse/LUCENE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073475#comment-13073475 ] Uwe Schindler edited comment on LUCENE-2979 at 8/1/11 9:40 AM: --- There was an issue opened today: LUCENE-3352 Maybe that is related to the config changes here, perhaps already fixed? was (Author: thetaphi): There was an issue opened today: LUCENE-3353 Maybe that is related to the config changes here, perhaps already fixed? Simplify configuration API of contrib Query Parser -- Key: LUCENE-2979 URL: https://issues.apache.org/jira/browse/LUCENE-2979 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 2.9, 3.0 Reporter: Adriano Crestani Assignee: Adriano Crestani Labels: api-change, gsoc, gsoc2011, lucene-gsoc-11, mentor Fix For: 3.4, 4.0 Attachments: LUCENE-2979_phillipe_ramalho_2.patch, LUCENE-2979_phillipe_ramalho_3.patch, LUCENE-2979_phillipe_ramalho_3.patch, LUCENE-2979_phillipe_reamalho.patch The current configuration API is very complicated and inherit the concept used by Attribute API to store token information in token streams. However, the requirements for both (QP config and token stream) are not the same, so they shouldn't be using the same thing. I propose to simplify QP config and make it less scary for people intending to use contrib QP. The task is not difficult, it will just require a lot of code change and figure out the best way to do it. That's why it's a good candidate for a GSoC project. I would like to hear good proposals about how to make the API more friendly and less scaring :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3352) ParametricRangeQueryNodeProcessor support for time zones
[ https://issues.apache.org/jira/browse/LUCENE-3352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073476#comment-13073476 ] Uwe Schindler commented on LUCENE-3352: --- This seems to be related to LUCENE-1768 and LUCENE-2979, as we need to change the config API, so the timezone would only be another param. Maybe that can be done with the work on those two issues. ParametricRangeQueryNodeProcessor support for time zones Key: LUCENE-3352 URL: https://issues.apache.org/jira/browse/LUCENE-3352 Project: Lucene - Java Issue Type: New Feature Components: modules/queryparser Reporter: Trejkaz It would be nice if there were a config attribute for setting a time zone for dates in the query. At the moment I am using my own query node processor to implement this, but I stumbled upon ParametricRangeQueryNodeProcessor and it is very close to being usable as-is. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-2337) Solr needs hits= added to the log when using grouping
[ https://issues.apache.org/jira/browse/SOLR-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen closed SOLR-2337. --- Resolution: Fixed Solr needs hits= added to the log when using grouping -- Key: SOLR-2337 URL: https://issues.apache.org/jira/browse/SOLR-2337 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.0 Reporter: Bill Bell Fix For: 4.0 Attachments: SOLR.2337.patch We monitor the Solr logs to try to review queries that have hits=0. This enables us to improve relevancy since they are easy to find and review. When using group=true, hits= does not show up: {code} 2011-01-27 01:10:16,117 INFO core.SolrCore - [collection1] webapp= path=/select params={group=truegroup.field=gendergroup.field=idq=*:*} status=0 QTime=15 {code} The code in QueryComponent.java needs to add the matches() after calling grouping.execute() and add up the total. It does return hits= in the log for mainResult, but not for standard grouping. This should be easy to add since matches are defined... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-1682) Implement CollapseComponent
[ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen closed SOLR-1682. --- Resolution: Fixed Implement CollapseComponent --- Key: SOLR-1682 URL: https://issues.apache.org/jira/browse/SOLR-1682 Project: Solr Issue Type: Sub-task Components: search Reporter: Martijn van Groningen Assignee: Shalin Shekhar Mangar Fix For: 3.4, 4.0 Attachments: SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682_prototype.patch, SOLR-1682_prototype.patch, SOLR-1682_prototype.patch, SOLR-236.patch, field-collapsing.patch Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2686) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/SOLR-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073515#comment-13073515 ] Michael McCandless commented on SOLR-2686: -- +1, though really this should be a Lucene issue (FieldCache is in Lucene). We actually have a start at this: the core part of UnInvertedField was factored into Lucene as oal.index.DocTermOrds. I think all we need to do is make this accessible through FieldCache. Extend FieldCache architecture to multiple Values - Key: SOLR-2686 URL: https://issues.apache.org/jira/browse/SOLR-2686 Project: Solr Issue Type: Bug Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073518#comment-13073518 ] Martijn van Groningen commented on LUCENE-3354: --- +1. If DocTermOrds is available in FieldCache, then Grouping (Term based impl) can also use DocTermOrds. Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Java Issue Type: Improvement Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073522#comment-13073522 ] Ryan McKinley commented on LUCENE-3354: --- What are thoughts on using DocValues rather then FieldCache? If we do choose to extend the FieldCache architecture, it would be so much cleaner if it were a simple MapK,V directly on the Reader rather then a static thing holding a WeakHashMapReader,Cache Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Java Issue Type: Improvement Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2676) Add a welcome-file-list with a welcome-file index.jsp to web.xml in Solr servlet war.
[ https://issues.apache.org/jira/browse/SOLR-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073527#comment-13073527 ] Jay R. Jaeger commented on SOLR-2676: - Wow. That was fast. Thanks. Add a welcome-file-list with a welcome-file index.jsp to web.xml in Solr servlet war. - Key: SOLR-2676 URL: https://issues.apache.org/jira/browse/SOLR-2676 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.1 Reporter: Jay R. Jaeger Assignee: Hoss Man Priority: Trivial Fix For: 3.4, 4.0 Some web application servers (e.g., IBM WebSphere application server) do not have a default welcome file list. The Solr servlet and related JSPs currently depend upon a default welcome file list. Adding a welcome-file-list entry to web.xml will rectify this problem in a compatible way: welcome-file-list welcome-fileindex.jsp/welcome-file /welcome-file-list -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1152673 - /lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/index/Tes tPayloads.java
OK I found the problem: 3.x can't handle the U+ character (we replace it on indexing), while trunk can. So I think we just have to fix randomFixedByteLengthUnicodeString to never use that char. I'll commit... Mike McCandless http://blog.mikemccandless.com On Sun, Jul 31, 2011 at 10:33 PM, Robert Muir rcm...@gmail.com wrote: On Sun, Jul 31, 2011 at 10:24 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : fix test to not create invalid unicode I'm confused ... when/why does randomFixedByteLengthUnicodeString not return valid unicode? I'm confused too, but all of the issues are with replacement chars for invalid unicode: Since the intent of this test is to test thread safety, not to test unicode enc/dec back and forth, I switched it to ascii until the test grows up (e.g. trunk, which now uses full unicode range correctly, maybe I backported this wrong before) -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073530#comment-13073530 ] Robert Muir commented on LUCENE-3354: - +1, die insanity, die. Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Java Issue Type: Improvement Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073533#comment-13073533 ] Michael McCandless commented on LUCENE-3354: +1 to moving FC to atomic readers only, and let SlowMultiReaderWrapper absorb the insanity. Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Java Issue Type: Improvement Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073532#comment-13073532 ] Martijn van Groningen commented on LUCENE-3354: --- bq. What are thoughts on using DocValues rather then FieldCache? Maybe both should be available. Not all fields have indexed docvalues. bq. We should start with this in 4.0! For backwards compatibility we could still have the FieldCache class, but just delegating. Changing the architecture seems like a big task to me. Maybe that should be done in a different issue. This issue will then depend on it. Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Java Issue Type: Improvement Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
order by function
Hi, i need to order by function like: sort=sum(field1,field2,field3)+desc but solr gives me this error: Missing sort order. why is this possible? i read that is possible to order by function, from version 1.3 (http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function) i use version 1.4 nobody has an idea? thanx Gastone - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073535#comment-13073535 ] Michael McCandless commented on LUCENE-3348: Thanks Simon; I'll make both of those fixes. Unfortunately there is still at least one more thread safety issue that I'm trying to track down... beasting uncovered a good seed. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073536#comment-13073536 ] Yonik Seeley commented on LUCENE-3354: -- bq. (icluding the broken Solr parts still using TopLevel FieldCache entries). Some top-level field cache uses are very much by design in Solr. If that ability is removed from Lucene, I guess we could always move some of the old FieldCache logic to Solr though. Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Java Issue Type: Improvement Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073538#comment-13073538 ] Uwe Schindler commented on LUCENE-3354: --- bq. If that ability is removed from Lucene, I guess we could always move some of the old FieldCache logic to Solr though. Solr can always use SlowMultiReaderWrapper (see above) Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Java Issue Type: Improvement Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] Incubator PMC/Board report for August 2011 (lucene-net-...@lucene.apache.org)
Dear Lucene.NET Developers, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 17 August 2011, 10 am Pacific. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted one week before the board meeting, to allow sufficient time for review. Please submit your report with sufficient time to allow the incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is one week prior to the board meeting. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. This should be appended to the Incubator Wiki page at: http://wiki.apache.org/incubator/August2011 Note: This manually populated. You may need to wait a little before this page is created from a template. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC
Re: order by function
Sort by function is not available in 1.4. It's in 3.1. On Aug 1, 2011, at 6:43 AM, Gastone Penzo wrote: Hi, i need to order by function like: sort=sum(field1,field2,field3)+desc but solr gives me this error: Missing sort order. why is this possible? i read that is possible to order by function, from version 1.3 (http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function) i use version 1.4 nobody has an idea? thanx Gastone - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org Grant Ingersoll - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073556#comment-13073556 ] Simon Willnauer commented on LUCENE-3348: - bq. Unfortunately there is still at least one more thread safety issue that I'm trying to track down... beasting uncovered a good seed. argh! can you post it here? simon IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9905 - Failure
Caused by: java.io.IOException: Cannot create directory: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/example/multicore/core1/data/index at org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:121) at org.apache.lucene.store.Lock.obtain(Lock.java:72) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1116) It feels like we've been experiencing a lot of failures lately in the example jetty tests. Although it may be a coincidence, it feels like it coincided with the solr build rewrite (and IIRC there were some changes made to the example test framework at that time?) -Yonik http://www.lucidimagination.com On Mon, Aug 1, 2011 at 12:24 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9905/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: JCC CompileError -- incorrect generic parameter detection
Hi Lukasz, On Sun, 31 Jul 2011, ?ukasz Jancewicz wrote: On Fri, Jul 29, 2011 at 17:09, Andi Vajda va...@apache.org wrote: For example, is there a piece of gentyref code that I could 'borrow' (with attribution of course) and include that in the JCC sources to fix this particular problem ? If you look at this file: http://code.google.com/p/gentyref/source/browse/src/main/java/com/googlecode/gentyref/GenericTypeReflector.java you'll see that it's completely independent from the rest of gentyref library, so I guess that, technically, you could just copy paste it to JCC. Not really, no. It depends on the whole thing but that's no big deal. The licensing is also compatible since it's Apache 2.0 licensed (like you said originally but I missed that, sorry). So it's probably ok to use/include it. I don't know much about Apache licensing, so I can't tell you if it's legal/appropriate to do so. But the technical possibility obviously exists. The advantage of including the library as a whole is that any future Java changes (Java 7, etc.) and bugs can be potentially taken care of by developers of gentyref. This code could be written in Java (and wrapped by JCC for itself) That's what I did in my patch. I included gentyref.jar in the classpath and generated JCC wrappers for it. So I did a custom build of JCC with that gentyref class wrapped and it does fix the problem you encountered but it then no longer compiles Lucene :-( I get this detailed error message from gentyref: jcc.cpp.JavaError: com.googlecode.gentyref.UnresolvedTypeVariableException: An exact type is requested, but the type contains a type variable that cannot be resolved. Variable: A from public org.apache.lucene.util.Attribute org.apache.lucene.util.AttributeSource.addAttribute(java.lang.Class) Hint: This is usually caused by trying to get an exact type when a generic method who's type parameters are not given is involved. Hacking it a bit, I catch the error and use the original reflection code when gentyref fails to see how far I get and I get a bit further but I hit more problems with too-specific types being resolved (like array of bool into [B). I could probably fix this too but I'm not yet convinced that gentyref is actually needed to solve the original problem. It feels like gentyref, cool as it is, is actually doing too much. Clearly, I see the bug you reported but I'm not sure where it is yet. Is it in the java.lang.reflect code or is it in jcc itself ? For example, the same problem happens if you just define DirectoryEntry as: public interface DirectoryEntry extends Entry, Iterable { } But not when I define it thus: import java.util.Iterator; public interface DirectoryEntry extends Entry, Iterable { Iterator iterator(); } Or thus: import java.util.Iterator; public interface DirectoryEntry extends Entry, IterableEntry { IteratorEntry iterator(); } It looks like the absence of an iterator() method definition triggers this. Maybe all I need to do is make the iterator method code generation a bit smarter, like not generate it if it's inherited from above anyway ? Or see if it's inherited and its return type is overriden by the extends ? I'm not quite sure yet what to do about this bug... Andi..
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073588#comment-13073588 ] Michael McCandless commented on LUCENE-3348: Here's what I run with the while1 tester in luceneutil: {{TestStressNRT -iters 3 -verbose -seed -6208047570437556381:-3138230871915238634}} I think what's special about the seed is maxBufferedDocs is 3, so we are doing tons of segment flushing. I dumbed back the test somewhat (turned off merging entirely, only 1 reader thread, up to 5 writer threads, and it still fails. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3030) Block tree terms dict index
[ https://issues.apache.org/jira/browse/LUCENE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3030: --- Attachment: LUCENE-3030.patch Checkpointing my current state here -- the big change is I added a Terms.intersect(CompiledAutomaton) method, which returns a TermsEnum, but there's something wrong it still -- seems to give the right results but makes LEV2 FuzzyQ slower. Block tree terms dict index - Key: LUCENE-3030 URL: https://issues.apache.org/jira/browse/LUCENE-3030 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3030.patch, LUCENE-3030.patch, LUCENE-3030.patch, LUCENE-3030.patch Our default terms index today breaks terms into blocks of fixed size (ie, every 32 terms is a new block), and then we build an index on top of that (holding the start term for each block). But, it should be better to instead break terms according to how they share prefixes. This results in variable sized blocks, but means within each block we maximize the shared prefix and minimize the resulting terms index. It should also be a speedup for terms dict intensive queries because the terms index becomes a true prefix trie, and can be used to fast-fail on term lookup (ie returning NOT_FOUND without having to seek/scan a terms block). Having a true prefix trie should also enable much faster intersection with automaton (but this will be a new issue). I've made an initial impl for this (called BlockTreeTermsWriter/Reader). It's still a work in progress... lots of nocommits, and hairy code, but tests pass (at least once!). I made two new codecs, temporarily called StandardTree, PulsingTree, that are just like their counterparts but use this new terms dict. I added a new exactOnly boolean to TermsEnum.seek. If that's true and the term is NOT_FOUND, we will (quickly) return NOT_FOUND and the enum is unpositioned (ie you should not call next(), docs(), etc.). In this approach the index and dict are tightly connected, so it does not support a pluggable index impl like BlockTermsWriter/Reader. Blocks are stored on certain nodes of the prefix trie, and can contain both terms and pointers to sub-blocks (ie, if the block is not a leaf block). So there are two trees, tied to one another -- the index trie, and the blocks. Only certain nodes in the trie map to a block in the block tree. I think this algorithm is similar to burst tries (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), except it allows terms to be stored on inner blocks (not just leaf blocks). This is important for Lucene because an [accidental] adversary could produce a terms dict with way too many blocks (way too much RAM used by the terms index). Still, with my current patch, an adversary can produce too-big blocks... which we may need to fix, by letting the terms index not be a true prefix trie on it's leaf edges. Exactly how the blocks are picked can be factored out as its own policy (but I haven't done that yet). Then, burst trie is one policy, my current approach is another, etc. The policy can be tuned to the terms' expected distribution, eg if it's a primary key field and you only use base 10 for each character then you want block sizes of size 10. This can make a sizable difference on lookup cost. I modified the FST Builder to allow for a plugin that freezes the tail (changed suffix) of each added term, because I use this to find the blocks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9905 - Failure
I agree, these events do feel coincidental. Not sure what constitutes the example test framework, but I did move the ExternalPaths utility class into the solr test-framework, because it's used from both the Solr core tests and the Solrj tests, which don't depend (anymore) on the Solr core tests. (branch_3x:7/13/2011:r1146191) I also had to change ExternalPaths.determineSourceHome() to handle the situation where there is no solr/conf/ dir in the classpath, e.g. the situation for Solrj common tests (which are now housed with the Solrj internal module) -- these tests do not have solr/core/src/test-files/solr/conf/ in their classpath. (branch_3x:7/22/2011:r1149691) That all said, it's not clear to me how these changes could have affected directory creation? Steve -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, August 01, 2011 11:49 AM To: dev@lucene.apache.org Subject: Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9905 - Failure Caused by: java.io.IOException: Cannot create directory: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only- 3.x/checkout/solr/example/multicore/core1/data/index at org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:121) at org.apache.lucene.store.Lock.obtain(Lock.java:72) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1116) It feels like we've been experiencing a lot of failures lately in the example jetty tests. Although it may be a coincidence, it feels like it coincided with the solr build rewrite (and IIRC there were some changes made to the example test framework at that time?) -Yonik http://www.lucidimagination.com On Mon, Aug 1, 2011 at 12:24 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9905/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1223) Query Filter fq with OR operator
[ https://issues.apache.org/jira/browse/SOLR-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073611#comment-13073611 ] Shawn Heisey commented on SOLR-1223: bq. I'd rather see a new filterQuery type like ofq than being stuck with the current options. Nested filterQueries including variables would obviously be the most flexible solution, but imho having two different filter types would add enough benefit in the meantime. I see that someone else had the same idea a long time before I did. I just brought this up on the solr-user list a few days ago, but I couldn't think of a good parameter name. The parameter name I came up with (fqu, filter query union) is not as good as ofq. I like Brian and Frederik's idea. Query Filter fq with OR operator Key: SOLR-1223 URL: https://issues.apache.org/jira/browse/SOLR-1223 Project: Solr Issue Type: New Feature Components: search Reporter: Brian Pearson Priority: Minor See this [issue|http://www.nabble.com/Query-Filter-fq-with-OR-operator-td23895837.html] for some background.Today, all of the Query filters specified with the fq parameter are AND'd together. This issue is about allowing a set of filters to be OR'd together (in addition to having another set of filters that are AND'd).The OR'd filters would of course be applied before any scoring is done. The advantage of this feature is that you will be able to break up complex filters into simple, more cacheable filters, which should improve performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2684) ConcurrentModificationException from BinaryResponseWriter
[ https://issues.apache.org/jira/browse/SOLR-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2684. Resolution: Fixed Assignee: Hoss Man Committed revision 1152885. CHANGES.txt attrib added to SOLR-1566 since this was a bug in unreleased code. Thanks Arul! ConcurrentModificationException from BinaryResponseWriter - Key: SOLR-2684 URL: https://issues.apache.org/jira/browse/SOLR-2684 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.0 Reporter: Arul Kalaipandian Assignee: Hoss Man Priority: Critical Labels: ConcurrentModificationException Fix For: 4.0 Attachments: SOLR-2684.patch, SOLR-2684.patch ConcurrentModificationException thrown from BinaryResponseWriter while writing SolrDocument to the response. SEVERE: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373) at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:384) at org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:98) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:242) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:139) at org.apache.solr.common.util.JavaBinCodec.writeArray(JavaBinCodec.java:377) at org.apache.solr.common.util.JavaBinCodec.writeSolrDocumentList(JavaBinCodec.java:340) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:226) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:139) at org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:134) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:222) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:139) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:87) at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:49) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:333) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9905 - Failure
On trunk currently, it looks like the following data directories (outside of /build) are used by the tests: ./core/src/test-files/solr/data ./example/multicore/core0/data ./example/multicore/core1/data Not sure if this was the case in the past or not... I'll verify. -Yonik http://www.lucidimagination.com On Mon, Aug 1, 2011 at 12:15 PM, Steven A Rowe sar...@syr.edu wrote: I agree, these events do feel coincidental. Not sure what constitutes the example test framework, but I did move the ExternalPaths utility class into the solr test-framework, because it's used from both the Solr core tests and the Solrj tests, which don't depend (anymore) on the Solr core tests. (branch_3x:7/13/2011:r1146191) I also had to change ExternalPaths.determineSourceHome() to handle the situation where there is no solr/conf/ dir in the classpath, e.g. the situation for Solrj common tests (which are now housed with the Solrj internal module) -- these tests do not have solr/core/src/test-files/solr/conf/ in their classpath. (branch_3x:7/22/2011:r1149691) That all said, it's not clear to me how these changes could have affected directory creation? Steve -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, August 01, 2011 11:49 AM To: dev@lucene.apache.org Subject: Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9905 - Failure Caused by: java.io.IOException: Cannot create directory: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only- 3.x/checkout/solr/example/multicore/core1/data/index at org.apache.lucene.store.SimpleFSLock.obtain(SimpleFSLockFactory.java:121) at org.apache.lucene.store.Lock.obtain(Lock.java:72) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1116) It feels like we've been experiencing a lot of failures lately in the example jetty tests. Although it may be a coincidence, it feels like it coincided with the solr build rewrite (and IIRC there were some changes made to the example test framework at that time?) -Yonik http://www.lucidimagination.com On Mon, Aug 1, 2011 at 12:24 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9905/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikola Tankovic updated LUCENE-2308: Attachment: LUCENE-2308-19.patch Removed old oal.Document, except in documentation. Tests pass! Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-ltc.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3338) Flexible query parser does not support open ranges and range queries with mixed inclusive and exclusive ranges
[ https://issues.apache.org/jira/browse/LUCENE-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani resolved LUCENE-3338. -- Resolution: Fixed Assignee: Adriano Crestani (was: Uwe Schindler) Flexible query parser does not support open ranges and range queries with mixed inclusive and exclusive ranges -- Key: LUCENE-3338 URL: https://issues.apache.org/jira/browse/LUCENE-3338 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.3 Reporter: Vinicius Barros Assignee: Adriano Crestani Fix For: 4.0 Attachments: LUCENE_3338_and_3343_2011_07_30.patch, week9-merged-nosurround.patch, week9-merged-nosurround_with_failing_junit.patch, week9-merged.patch, week9.patch Flexible query parser does not support open ranges and range queries with mixed inclusive and exclusive ranges. These two problems were found while developing LUCENE-1768. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3343) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073663#comment-13073663 ] Adriano Crestani commented on LUCENE-3343: -- The code for 4.0 was just committed to the repository (rev 1152892) Comparison operators ,=,,= and = support as RangeQuery syntax in QueryParser Key: LUCENE-3343 URL: https://issues.apache.org/jira/browse/LUCENE-3343 Project: Lucene - Java Issue Type: New Feature Components: modules/queryparser Reporter: Olivier Favre Assignee: Adriano Crestani Priority: Minor Labels: parser, query Fix For: 3.4, 4.0 Attachments: NumCompQueryParser-3x.patch, NumCompQueryParser.patch Original Estimate: 96h Remaining Estimate: 96h To offer better interoperability with other search engines and to provide an easier and more straight forward syntax, the operators , =, , = and = should be available to express an open range query. They should at least work for numeric queries. '=' can be made a synonym for ':'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9905 - Failure
On Mon, Aug 1, 2011 at 2:49 PM, Yonik Seeley yo...@lucidimagination.com wrote: On trunk currently, it looks like the following data directories (outside of /build) are used by the tests: ./core/src/test-files/solr/data ./example/multicore/core0/data ./example/multicore/core1/data Not sure if this was the case in the past or not... I'll verify. Yep, older versions of Solr seem to have the same behavior. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073678#comment-13073678 ] Simon Willnauer commented on LUCENE-3348: - mike I can not reproduce this failure.. what exactly is failing there? maybe you can put the output in a text file and attache it? Regarding the latest patch, I think we can call DWFlushControl#addFlushableState() from DWFlushControl#markForFullFlush() and use a global list to collect the DWPT for the full flush. I think we should move the getAndLock call into DWFlushControl something like DWFlushControl#obtainAndLock(), this would allow us to make the check and the DWFlushControl#addFlushableState() method private to DWFC. Further we can also simplify the deleteQueue check a little since we already obtained a ThreadState we don't need to unlock the state again after calling addFlushableState(), something like this: {code} ThreadState obtainAndLock() { final ThreadState perThread = perThreadPool.getAndLock(Thread .currentThread(), documentsWriter); if (perThread.isActive() perThread.perThread.deleteQueue != documentsWriter.deleteQueue) { // There is a flush-all in process and this DWPT is // now stale -- enroll it for flush and try for // another DWPT: addFlushableState(perThread); } return perThread; } {code} Eventually we are spending too much time in full flush since we lock all ThreadStates at least once while some indexing threads might have already helped out with swapping out DWPT instances. I think we can collect already swapped out ThreadStates during a full flush and only check the ones that have not been processed? IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9924 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9924/ No tests ran. Build Log (for compile errors): [...truncated 12464 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3030) Block tree terms dict index
[ https://issues.apache.org/jira/browse/LUCENE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073800#comment-13073800 ] Michael McCandless commented on LUCENE-3030: I created a branch https://svn.apache.org/repos/asf/lucene/dev/branches/blocktree_3030 for iterating on this. Block tree terms dict index - Key: LUCENE-3030 URL: https://issues.apache.org/jira/browse/LUCENE-3030 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3030.patch, LUCENE-3030.patch, LUCENE-3030.patch, LUCENE-3030.patch Our default terms index today breaks terms into blocks of fixed size (ie, every 32 terms is a new block), and then we build an index on top of that (holding the start term for each block). But, it should be better to instead break terms according to how they share prefixes. This results in variable sized blocks, but means within each block we maximize the shared prefix and minimize the resulting terms index. It should also be a speedup for terms dict intensive queries because the terms index becomes a true prefix trie, and can be used to fast-fail on term lookup (ie returning NOT_FOUND without having to seek/scan a terms block). Having a true prefix trie should also enable much faster intersection with automaton (but this will be a new issue). I've made an initial impl for this (called BlockTreeTermsWriter/Reader). It's still a work in progress... lots of nocommits, and hairy code, but tests pass (at least once!). I made two new codecs, temporarily called StandardTree, PulsingTree, that are just like their counterparts but use this new terms dict. I added a new exactOnly boolean to TermsEnum.seek. If that's true and the term is NOT_FOUND, we will (quickly) return NOT_FOUND and the enum is unpositioned (ie you should not call next(), docs(), etc.). In this approach the index and dict are tightly connected, so it does not support a pluggable index impl like BlockTermsWriter/Reader. Blocks are stored on certain nodes of the prefix trie, and can contain both terms and pointers to sub-blocks (ie, if the block is not a leaf block). So there are two trees, tied to one another -- the index trie, and the blocks. Only certain nodes in the trie map to a block in the block tree. I think this algorithm is similar to burst tries (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), except it allows terms to be stored on inner blocks (not just leaf blocks). This is important for Lucene because an [accidental] adversary could produce a terms dict with way too many blocks (way too much RAM used by the terms index). Still, with my current patch, an adversary can produce too-big blocks... which we may need to fix, by letting the terms index not be a true prefix trie on it's leaf edges. Exactly how the blocks are picked can be factored out as its own policy (but I haven't done that yet). Then, burst trie is one policy, my current approach is another, etc. The policy can be tuned to the terms' expected distribution, eg if it's a primary key field and you only use base 10 for each character then you want block sizes of size 10. This can make a sizable difference on lookup cost. I modified the FST Builder to allow for a plugin that freezes the tail (changed suffix) of each added term, because I use this to find the blocks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3348: --- Attachment: fail.txt.bz2 Full output from a failure. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3348) IndexWriter applies wrong deletes during concurrent flush-all
[ https://issues.apache.org/jira/browse/LUCENE-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073804#comment-13073804 ] Michael McCandless commented on LUCENE-3348: OK I attached output of a failure -- it's 400K lines. Search for the AssertionError, where id:26 couldn't find a doc nor tombstone. IndexWriter applies wrong deletes during concurrent flush-all - Key: LUCENE-3348 URL: https://issues.apache.org/jira/browse/LUCENE-3348 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3348.patch, LUCENE-3348.patch, LUCENE-3348.patch, fail.txt.bz2 Yonik uncovered this with the TestRealTimeGet test: if a flush-all is underway, it is possible for an incoming update to pick a DWPT that is stale, ie, not yet pulled/marked for flushing, yet the DW has cutover to a new deletes queue. If this happens, and the deleted term was also updated in one of the non-stale DWPTs, then the wrong document is deleted and the test fails by detecting the wrong value. There's a 2nd failure mode that I haven't figured out yet, whereby 2 docs are returned when searching by id (there should only ever be 1 doc since the test uses updateDocument which is atomic wrt commit/reopen). Yonik verified the test passes pre-DWPT, so my guess is (but I have yet to verify) this test also passes on 3.x. I'll backport the test to 3.x to be sure. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values
[ https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073828#comment-13073828 ] Hoss Man commented on LUCENE-3354: -- bq. This would also remove the insanity issues. FWIW: the WeakHashMap isn't the sole source of insanity - that can also come about from inconsistent usage for a single field (ie: asking for string and int caches for the same field) Extend FieldCache architecture to multiple Values - Key: LUCENE-3354 URL: https://issues.apache.org/jira/browse/LUCENE-3354 Project: Lucene - Java Issue Type: Improvement Reporter: Bill Bell I would consider this a bug. It appears lots of people are working around this limitation, why don't we just change the underlying data structures to natively support multiValued fields in the FieldCache architecture? Then functions() will work properly, and we can do things like easily geodist() on a multiValued field. Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2979) Simplify configuration API of contrib Query Parser
[ https://issues.apache.org/jira/browse/LUCENE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillipe Ramalho updated LUCENE-2979: - Attachment: LUCENE-2979_phillipe_ramalho_4_trunk.patch LUCENE-2979_phillipe_ramalho_4_3x.patch Here is a patch that backports the new configuration API to 3.x. I did exactly as I described in my proposal and it seems to be working as expected. I changed the documentation as well (I hope I everything, can you double check that Adriano?). I also created a simple example of how to use the new API in package.html and added to both 3.x and trunk. Please, let me know if everything looks good and if I didn't break any API. Simplify configuration API of contrib Query Parser -- Key: LUCENE-2979 URL: https://issues.apache.org/jira/browse/LUCENE-2979 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 2.9, 3.0 Reporter: Adriano Crestani Assignee: Adriano Crestani Labels: api-change, gsoc, gsoc2011, lucene-gsoc-11, mentor Fix For: 3.4, 4.0 Attachments: LUCENE-2979_phillipe_ramalho_2.patch, LUCENE-2979_phillipe_ramalho_3.patch, LUCENE-2979_phillipe_ramalho_3.patch, LUCENE-2979_phillipe_ramalho_4_3x.patch, LUCENE-2979_phillipe_ramalho_4_trunk.patch, LUCENE-2979_phillipe_reamalho.patch The current configuration API is very complicated and inherit the concept used by Attribute API to store token information in token streams. However, the requirements for both (QP config and token stream) are not the same, so they shouldn't be using the same thing. I propose to simplify QP config and make it less scary for people intending to use contrib QP. The task is not difficult, it will just require a lot of code change and figure out the best way to do it. That's why it's a good candidate for a GSoC project. I would like to hear good proposals about how to make the API more friendly and less scaring :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2979) Simplify configuration API of contrib Query Parser
[ https://issues.apache.org/jira/browse/LUCENE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075974#comment-13075974 ] Phillipe Ramalho commented on LUCENE-2979: -- Hi Uwe, Is there anything to be fixed in 3352? I see it's a new feature JIRA. Am I missing something? Currently, I am only working on migrating the old to new API and doing no changes on how the configuration is used. So nothing here changes (at least should not) how ParametricQueryNodeProcessor works. Simplify configuration API of contrib Query Parser -- Key: LUCENE-2979 URL: https://issues.apache.org/jira/browse/LUCENE-2979 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 2.9, 3.0 Reporter: Adriano Crestani Assignee: Adriano Crestani Labels: api-change, gsoc, gsoc2011, lucene-gsoc-11, mentor Fix For: 3.4, 4.0 Attachments: LUCENE-2979_phillipe_ramalho_2.patch, LUCENE-2979_phillipe_ramalho_3.patch, LUCENE-2979_phillipe_ramalho_3.patch, LUCENE-2979_phillipe_ramalho_4_3x.patch, LUCENE-2979_phillipe_ramalho_4_trunk.patch, LUCENE-2979_phillipe_reamalho.patch The current configuration API is very complicated and inherit the concept used by Attribute API to store token information in token streams. However, the requirements for both (QP config and token stream) are not the same, so they shouldn't be using the same thing. I propose to simplify QP config and make it less scary for people intending to use contrib QP. The task is not difficult, it will just require a lot of code change and figure out the best way to do it. That's why it's a good candidate for a GSoC project. I would like to hear good proposals about how to make the API more friendly and less scaring :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075978#comment-13075978 ] Michael McCandless commented on LUCENE-2308: Patch looks good Nikola; I'll commit to the branch! I think the next step is to remove the oal.document package and any related classes (eg, DocumentStoredFieldVisitor), and then do a massive rename of doc/document2 back to doc/document? Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-ltc.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2565) Prevent IW#close and cut over to IW#commit
[ https://issues.apache.org/jira/browse/SOLR-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075980#comment-13075980 ] Jason Rutherglen commented on SOLR-2565: This issue says committed in the comments, however it's status is: Unresolved? Prevent IW#close and cut over to IW#commit -- Key: SOLR-2565 URL: https://issues.apache.org/jira/browse/SOLR-2565 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2565.patch Spinnoff from SOLR-2193. We already have a branch to work on this issue here https://svn.apache.org/repos/asf/lucene/dev/branches/solr2193 The main goal here is to prevent solr from closing the IW and use IW#commit instead. AFAIK the main issues here are: The update handler needs an overhaul. A few goals I think we might want to look at: 1. Expose the SolrIndexWriter in the api or add the proper abstractions to get done what we now do with special casing: 2. Stop closing the IndexWriter and start using commit (still lazy IW init though). 3. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level. 4. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index. Eventually this is a preparation for NRT support in Solr which I will create a followup issue for. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2565) Prevent IW#close and cut over to IW#commit
[ https://issues.apache.org/jira/browse/SOLR-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075993#comment-13075993 ] Mark Miller commented on SOLR-2565: --- Yeah, sorry - it's open as a reminder for me to make that changes note (or at least evaluate if something should be done) and do the wiki documentation. I'll try and do that tomorrow if I can and get this closed. Prevent IW#close and cut over to IW#commit -- Key: SOLR-2565 URL: https://issues.apache.org/jira/browse/SOLR-2565 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2565.patch Spinnoff from SOLR-2193. We already have a branch to work on this issue here https://svn.apache.org/repos/asf/lucene/dev/branches/solr2193 The main goal here is to prevent solr from closing the IW and use IW#commit instead. AFAIK the main issues here are: The update handler needs an overhaul. A few goals I think we might want to look at: 1. Expose the SolrIndexWriter in the api or add the proper abstractions to get done what we now do with special casing: 2. Stop closing the IndexWriter and start using commit (still lazy IW init though). 3. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level. 4. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index. Eventually this is a preparation for NRT support in Solr which I will create a followup issue for. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076033#comment-13076033 ] Nikola Tankovic commented on LUCENE-2308: - Yes, exactly! That is my next step, shouldn't take long. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-ltc.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9930 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9930/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest.testMergeIndexesByCoreName Error Message: org.apache.solr.client.solrj.SolrServerException: org.apache.lucene.store.LockReleaseFailedException: failed to delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/example/multicore/core1/data/index/org.apache.solr.core.RefCntRamDirectory@46b4be3d lockFactory=org.apache.lucene.store.simplefslockfact...@235f4a7f-write.lock Stack Trace: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: org.apache.lucene.store.LockReleaseFailedException: failed to delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/example/multicore/core1/data/index/org.apache.solr.core.RefCntRamDirectory@46b4be3d lockFactory=org.apache.lucene.store.simplefslockfact...@235f4a7f-write.lock at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:153) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.MergeIndexesExampleTestBase.setupCores(MergeIndexesExampleTestBase.java:90) at org.apache.solr.client.solrj.MergeIndexesExampleTestBase.testMergeIndexesByCoreName(MergeIndexesExampleTestBase.java:145) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1335) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1240) Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.lucene.store.LockReleaseFailedException: failed to delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/example/multicore/core1/data/index/org.apache.solr.core.RefCntRamDirectory@46b4be3d lockFactory=org.apache.lucene.store.simplefslockfact...@235f4a7f-write.lock at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142) Caused by: org.apache.lucene.store.LockReleaseFailedException: failed to delete /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/example/multicore/core1/data/index/org.apache.solr.core.RefCntRamDirectory@46b4be3d lockFactory=org.apache.lucene.store.simplefslockfact...@235f4a7f-write.lock at org.apache.lucene.store.SimpleFSLock.release(SimpleFSLockFactory.java:133) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1885) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1815) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1779) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:143) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:183) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:416) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:71) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:140) Build Log (for compile errors): [...truncated 14167 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3355) Incorrect behaviour of MultiFieldQueryNodeProcessor when default operator is 'AND'
Incorrect behaviour of MultiFieldQueryNodeProcessor when default operator is 'AND' -- Key: LUCENE-3355 URL: https://issues.apache.org/jira/browse/LUCENE-3355 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.3 Reporter: Trejkaz StandardQueryNodeProcessorPipeline runs MultiFieldQueryNodeProcessor before GroupQueryNodeProcessor. MultiFieldQueryNodeProcessor, if it encounters a node with no field, will do this: {code} return new GroupQueryNode(new BooleanQueryNode(children)); {code} GroupQueryNodeProcessor comes along later on, sees that no operator is specified, so it applies the default operator, which, if set to 'AND', results in: {code} +properties:text +text:text {code} Which I don't think matches the intent of the multi-field processor. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3335) jrebug causes porter stemmer to sigsegv
[ https://issues.apache.org/jira/browse/LUCENE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076053#comment-13076053 ] Shay Banon commented on LUCENE-3335: @Uwe I actually forgot about this, and did not think it was because of the porter stemmer at the time, especially since I did try and reproduce it and never managed to (I thought it was coincidence it crashed there). From my experience, you get very little help from sun/oracle when using unorthodox flags like agressive opts without proper recreation. Well, you get very little help there even when you do produce recreation... (see this issue that I opened for example: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129) . I am the reason behind Lucene 1.9.1 release with the major bug in buffering introduced in 1.9 way back in the days, do you really think I would not contact if I thought there really was a problem associated with Lucene? jrebug causes porter stemmer to sigsegv --- Key: LUCENE-3335 URL: https://issues.apache.org/jira/browse/LUCENE-3335 Project: Lucene - Java Issue Type: Bug Affects Versions: 1.9, 1.9.1, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2, 3.3, 3.4, 4.0 Environment: - JDK 7 Preview Release, GA (may also affect update _1, targeted fix is JDK 1.7.0_2) - JDK 1.6.0_20+ with -XX:+OptimizeStringConcat or -XX:+AggressiveOpts Reporter: Robert Muir Assignee: Robert Muir Labels: Java7 Attachments: LUCENE-3335.patch, LUCENE-3335_slow.patch, patch-0uwe.patch happens easily on java7: ant test -Dtestcase=TestPorterStemFilter -Dtests.iter=100 might happen on 1.6.0_u26 too, a user reported something that looks like the same bug already: http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org