[jira] [Closed] (SOLR-712) Support for relative path XML Entity Includes
[ https://issues.apache.org/jira/browse/SOLR-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Nithian closed SOLR-712. - Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 This looks to have been resolved by the various XML parser changes. Awesome! Support for relative path XML Entity Includes - Key: SOLR-712 URL: https://issues.apache.org/jira/browse/SOLR-712 Project: Solr Issue Type: Improvement Affects Versions: 1.2, 1.3 Reporter: Amit Nithian Priority: Minor Fix For: 4.0 Attachments: XMLEntityInclude.tgz Original Estimate: 1h Remaining Estimate: 1h This patch modifies the Config.java and IndexSchema.java to support XML entity includes of relative paths. When parsing an InputStream not providing a SystemID (i.e. base path for example) to the DOM parser will basically force all entity includes to be relative to the base project directory rather than the directory hosting the document. This patch simply passes in the configuration directory as the systemID thus making the entity includes relative to the home of solrconfig.xml and schema.xml. The IndexSchema.java was modified to ensure objects do NOT process the xml:base attribute. Newer Xerces-J parsers allow for the removal of this attribute in the DOM (by setting the appropriate feature); however, the DOM parser used by Java 5 doesn't support this feature. For example: Without the entity include, if my Solr app was running on C:\solr, then any entity includes would have to be relative to C:\solr regardless of solrconfig.xml and schema.xml. This patch allows for includes relative to the conf directory of solr.home (i.e. ../../my_base_schema.xml) would be located two directories above conf. Please submit improvements or comments on this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3304) Add Solr support for the new Lucene spatial module
[ https://issues.apache.org/jira/browse/SOLR-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-3304: --- Attachment: SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch Thanks for finding and fixing that bug Andy. Your fix wasn't quite right though since the getStrategy() method you refactored synchronized on a parameter (pointless) instead of the field. I fixed this. This new patch makes that and various other changes: * synchronized with the latest source tree (e.g. Spatial4j 0.3) ** This means distances are now degrees based (0-180 for circle radius) not kilometers * removed ignoreIncompatibleGeometry option (see LUCENE-4173) * Use the input string as the stored value that is returned. So if you give lat,lon then that's what you get back, in whatever number of decimal places you chose. * added prefixGridScanLevel performance tuning option to SpatialRecursivePrefixTreeFieldType (simply exposed it from the strategy) * keep distErrPct as a fraction (no change) It would be nice to have a kilometer unit option but that isn't easily done until Spatial4j's shape reader gets to be more flexible. That can wait. That needScore local-param hack (see SOLR-2883) is unfortunate, as Solr can't get a Filter from a field type. I'm tempted to change the default to 'false' as leaving it at true' triggers large RAM requirements and slow-downs for SpatialRecursivePrefixTreeFieldType. This could be an opportunity to specify what the score should be, come to think of it. Instead of needScore=false, maybe score=none (default) or score=distance or score=recipDistance or something like that. The TwoDoubles strategy needs more attention and tests in Lucene spatial, but I don't want that to hold up this patch. Shall I remove the adapter or let it get committed but don't advertise it until it's more worthy? Add Solr support for the new Lucene spatial module -- Key: SOLR-3304 URL: https://issues.apache.org/jira/browse/SOLR-3304 Project: Solr Issue Type: New Feature Affects Versions: 4.0-ALPHA Reporter: Bill Bell Assignee: David Smiley Labels: spatial Attachments: SOLR-3304_Solr_fields_for_Lucene_spatial_module (fieldName in Strategy) - indexableFields.patch, SOLR-3304_Solr_fields_for_Lucene_spatial_module (fieldName in Strategy).patch, SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch, SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch, SOLR-3304_Solr_fields_for_Lucene_spatial_module.patch, SOLR-3304-strategy-getter-fixed.patch Get the Solr spatial module integrated with the lucene spatial module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-712) Support for relative path XML Entity Includes
[ https://issues.apache.org/jira/browse/SOLR-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452795#comment-13452795 ] Uwe Schindler commented on SOLR-712: Hi, I did not know about this issue, this is solved since Solr 3.1 (issue SOLR-1656). Support for relative path XML Entity Includes - Key: SOLR-712 URL: https://issues.apache.org/jira/browse/SOLR-712 Project: Solr Issue Type: Improvement Affects Versions: 1.2, 1.3 Reporter: Amit Nithian Priority: Minor Fix For: 4.0 Attachments: XMLEntityInclude.tgz Original Estimate: 1h Remaining Estimate: 1h This patch modifies the Config.java and IndexSchema.java to support XML entity includes of relative paths. When parsing an InputStream not providing a SystemID (i.e. base path for example) to the DOM parser will basically force all entity includes to be relative to the base project directory rather than the directory hosting the document. This patch simply passes in the configuration directory as the systemID thus making the entity includes relative to the home of solrconfig.xml and schema.xml. The IndexSchema.java was modified to ensure objects do NOT process the xml:base attribute. Newer Xerces-J parsers allow for the removal of this attribute in the DOM (by setting the appropriate feature); however, the DOM parser used by Java 5 doesn't support this feature. For example: Without the entity include, if my Solr app was running on C:\solr, then any entity includes would have to be relative to C:\solr regardless of solrconfig.xml and schema.xml. This patch allows for includes relative to the conf directory of solr.home (i.e. ../../my_base_schema.xml) would be located two directories above conf. Please submit improvements or comments on this patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3819) Facet count not working when tagging excluding filters for range facets with group.facet true
[ https://issues.apache.org/jira/browse/SOLR-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ricardo Merizalde updated SOLR-3819: Summary: Facet count not working when tagging excluding filters for range facets with group.facet true (was: Facet count not working when tagging excluding filters for range facets with group.facet is true) Facet count not working when tagging excluding filters for range facets with group.facet true --- Key: SOLR-3819 URL: https://issues.apache.org/jira/browse/SOLR-3819 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0-BETA Environment: 12.0.0 Darwin Kernel Version 12.0. Reporter: Ricardo Merizalde I'm creating a range facet and I want to support multiple selection for it. However, when I set group.facet on the tags/exclusions for filters stop working. In other words, I only get the facet values for the filtered documents. The following link works: http://localhost:8983/solr/catalogPreview/select?q=*:*facet=truewt=xmlrows=0facet.range={!ex%3DsalePrice}salePricef.salePrice.facet.range.gap=75f.salePrice.facet.range.start=100f.salePrice.facet.range.end=600group=truegroup.field=productIdf.salePrice.facet.mincount=1fq={!tag=salePrice}salePrice:[100%20TO%20175]group.facet=false The following doesn't: http://localhost:8983/solr/catalogPreview/select?q=*:*facet=truewt=xmlrows=0facet.range={!ex%3DsalePrice}salePricef.salePrice.facet.range.gap=75f.salePrice.facet.range.start=100f.salePrice.facet.range.end=600group=truegroup.field=productIdf.salePrice.facet.mincount=1fq={!tag=salePrice}salePrice:[100%20TO%20175]group.facet=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
patch attached, what next?!
Hi, I finished attaching the patch to: https://issues.apache.org/jira/browse/SOLR-3574 The status of the Jira issue is - *Status:* [image: In Progress] In Progress - *Priority:* [image: Major] Major - *Resolution:* Unresolved *Is there something else I should do (change some status/resolution and to what?), before someone inspects the patch?* I cannot see a log work option, so I can't change the remaining time of the jira issue. But this might not be so important. Cheers, Despot
[jira] [Commented] (SOLR-3820) Solr Admin Query form is missing some edismax request parameters
[ https://issues.apache.org/jira/browse/SOLR-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452844#comment-13452844 ] Jan Høydahl commented on SOLR-3820: --- Good catch Solr Admin Query form is missing some edismax request parameters Key: SOLR-3820 URL: https://issues.apache.org/jira/browse/SOLR-3820 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0-BETA Reporter: Jack Krupansky Fix For: 4.0 The following edismax parameters are missing from the Solr Admin Query form: uf - User Fields pf2 - bigram phrase boost fields pf3 - trigram phrase boost fields ps2 - phrase slop for bigram phrases ps3 - phrase slop for trigram phrases boost - multiplicative boost function stopwords - remove stopwords from mandatory matching component (true/false, defaults to true) lowercaseOperators - Enable lower-case and and or as operators (true/false, defaults to true) The ability to set field name aliases is also missing: f.myalias.qf=realfield. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4355) improve AtomicReader sugar apis
[ https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4355: Attachment: LUCENE-4355.patch updated patch: sugar fixed for docsEnum/dpEnum as proposed. wasn't as bad as I thought :) improve AtomicReader sugar apis --- Key: LUCENE-4355 URL: https://issues.apache.org/jira/browse/LUCENE-4355 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4355.patch, LUCENE-4355.patch I thought about this after looking @ LUCENE-4353: AtomicReader has some sugar APIs that are over top of the flex apis (Fields, Terms, ...). But these might be a little trappy/confusing compared to 3.x. # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and .termPositionsEnum(Bits, ...). I also don't think we need variants that take flags here. We should simplify these to be less trappy. I think we only need (String, BytesRef) here. # This means you need to use the flex apis for more expert usage: but we make this a bit too hard since we only let you get a Terms (which you must null check, then call .iterator() on, then seekExact, ...). I think it could help if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x had a method that let you get a 'positioned termsenum'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4355) improve AtomicReader sugar apis
[ https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4355: Fix Version/s: 4.0 5.0 Assignee: Robert Muir improve AtomicReader sugar apis --- Key: LUCENE-4355 URL: https://issues.apache.org/jira/browse/LUCENE-4355 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Fix For: 5.0, 4.0 Attachments: LUCENE-4355.patch, LUCENE-4355.patch I thought about this after looking @ LUCENE-4353: AtomicReader has some sugar APIs that are over top of the flex apis (Fields, Terms, ...). But these might be a little trappy/confusing compared to 3.x. # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and .termPositionsEnum(Bits, ...). I also don't think we need variants that take flags here. We should simplify these to be less trappy. I think we only need (String, BytesRef) here. # This means you need to use the flex apis for more expert usage: but we make this a bit too hard since we only let you get a Terms (which you must null check, then call .iterator() on, then seekExact, ...). I think it could help if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x had a method that let you get a 'positioned termsenum'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks
[ https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452871#comment-13452871 ] Uwe Schindler commented on LUCENE-4196: --- Hi Robert, I wanted to go through the codec code to check this myself. I just had no time to do it. E.g. things like the CompoundFileReader not using hard checks is one reason, why I want to go through it a second time. Whats the issue with keeping this issue open as a todo task? Turn asserts in I/O related code into hard checks - Key: LUCENE-4196 URL: https://issues.apache.org/jira/browse/LUCENE-4196 Project: Lucene - Core Issue Type: Task Components: core/index Affects Versions: 4.0-ALPHA Reporter: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-4196.patch In lots of codecs we only assert, that e.g. some things inside files are correctly in bounds, which leads to security problems (ok, not as bad as C-Style buffer overflows), but e.g. allocating a large array after reading a VInt from a file header and then OOM, is a security issue. So we have to check all those contracts for files as hard checks, especially as a simply check in most cases dont cost anything (and it costs not more than the assert itsself, as the assert also takes CPU power, because it needs a check one time on a static final class field). Of course we cannot check values we read when reading postings, but the simple checks that any postings file has correct header and something like a positive number of elements, or number of elements file size,..., a bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, Mike changed all of those to asserts during the flex development (in my opinion with no real reason). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452874#comment-13452874 ] Robert Muir commented on LUCENE-4369: - Chris: well there is a lot more to convey than the old Field.Index.NOT_ANALYZED: # text is treated as if it went thru keywordanalyzer # term frequencies and positions are omitted # length normalization and index-time boosts are disabled The idea of MatchOnly is to describe that the field is really only useful for matching, not searching. The other 2 things this Field does wrt scoring and index options become important when someone adds multiple instances under the same name: I think its important to convey that its still only 'matching' and they wont have real scoring here. The problem I see with StringField as a name is that it doesn't hint at any of this. The current name can lead you to believe you should use it because you happen to have your content as a Java String. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks
[ https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452876#comment-13452876 ] Robert Muir commented on LUCENE-4196: - There is no issue except the fix version field: I'm just trying to get things with fixVersion=4.0 contained and assigned to people who are actually planning on working the issues in the next few days, or moved out of the release. If there is really more work thats necessary before 4.0 and someone is planning on working on it, then I think it should have the fixVersion. But if its just a future item that would be nice, then it should be moved out. Turn asserts in I/O related code into hard checks - Key: LUCENE-4196 URL: https://issues.apache.org/jira/browse/LUCENE-4196 Project: Lucene - Core Issue Type: Task Components: core/index Affects Versions: 4.0-ALPHA Reporter: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-4196.patch In lots of codecs we only assert, that e.g. some things inside files are correctly in bounds, which leads to security problems (ok, not as bad as C-Style buffer overflows), but e.g. allocating a large array after reading a VInt from a file header and then OOM, is a security issue. So we have to check all those contracts for files as hard checks, especially as a simply check in most cases dont cost anything (and it costs not more than the assert itsself, as the assert also takes CPU power, because it needs a check one time on a static final class field). Of course we cannot check values we read when reading postings, but the simple checks that any postings file has correct header and something like a positive number of elements, or number of elements file size,..., a bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, Mike changed all of those to asserts during the flex development (in my opinion with no real reason). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4196) Turn asserts in I/O related code into hard checks
[ https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452877#comment-13452877 ] Uwe Schindler commented on LUCENE-4196: --- Just remove the fix version alltogether. Turn asserts in I/O related code into hard checks - Key: LUCENE-4196 URL: https://issues.apache.org/jira/browse/LUCENE-4196 Project: Lucene - Core Issue Type: Task Components: core/index Affects Versions: 4.0-ALPHA Reporter: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-4196.patch In lots of codecs we only assert, that e.g. some things inside files are correctly in bounds, which leads to security problems (ok, not as bad as C-Style buffer overflows), but e.g. allocating a large array after reading a VInt from a file header and then OOM, is a security issue. So we have to check all those contracts for files as hard checks, especially as a simply check in most cases dont cost anything (and it costs not more than the assert itsself, as the assert also takes CPU power, because it needs a check one time on a static final class field). Of course we cannot check values we read when reading postings, but the simple checks that any postings file has correct header and something like a positive number of elements, or number of elements file size,..., a bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, Mike changed all of those to asserts during the flex development (in my opinion with no real reason). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4196) Turn asserts in I/O related code into hard checks
[ https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4196: Fix Version/s: (was: 4.0) Turn asserts in I/O related code into hard checks - Key: LUCENE-4196 URL: https://issues.apache.org/jira/browse/LUCENE-4196 Project: Lucene - Core Issue Type: Task Components: core/index Affects Versions: 4.0-ALPHA Reporter: Uwe Schindler Attachments: LUCENE-4196.patch In lots of codecs we only assert, that e.g. some things inside files are correctly in bounds, which leads to security problems (ok, not as bad as C-Style buffer overflows), but e.g. allocating a large array after reading a VInt from a file header and then OOM, is a security issue. So we have to check all those contracts for files as hard checks, especially as a simply check in most cases dont cost anything (and it costs not more than the assert itsself, as the assert also takes CPU power, because it needs a check one time on a static final class field). Of course we cannot check values we read when reading postings, but the simple checks that any postings file has correct header and something like a positive number of elements, or number of elements file size,..., a bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, Mike changed all of those to asserts during the flex development (in my opinion with no real reason). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452884#comment-13452884 ] Chris Male commented on LUCENE-4369: As I say, I totally support renaming this field to something. I think calling it anything else will help with distinguishing it from TextField so I'm +1 for MatchOnly. Perhaps that'll encourage people to read the docs about it not being analyzed. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4369: Attachment: LUCENE-4369.patch patch: just from a eclipse rename of 'StringField - MatchOnlyField' and 'LuceneTestCase.newStringField - LuceneTestCase.newMatchOnlyField' StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452900#comment-13452900 ] Mark Harwood commented on LUCENE-4369: -- SingleTermField ? Not sure matching vs searching is a commonly understood differentiation. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452908#comment-13452908 ] Robert Muir commented on LUCENE-4369: - Mark: I don't have strong feelings one way or the other. We don't need to rush it, I think its fairly contained to change, we don't even have to deal with this for 4.0 if we aren't happy: we can deprecate StringField just have it extend XXXField in a future 4.x release too. But I think the name StringField is not really good at all so its good to get all the ideas out here. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452914#comment-13452914 ] Mark Harwood commented on LUCENE-4369: -- Agreed on the need for a change - names are important. I have a problem with using match on its own because the word is often associated with partial matching e.g. best match or fuzzy match. A quick google suggests match has more connotations with fuzziness than exactness - there are 162m results for best match vs only 45m results for exact match. So how about ExactMatchField? StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4258) Incremental Field Updates through Stacked Segments
[ https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sivan Yogev updated LUCENE-4258: Attachment: LUCENE-4258-inner-changes.patch LUCENE-4258-API-changes.patch IncrementalFieldUpdates.odp Adding a design proposal presentation, and two patches following the proposal concepts. The first patch includes proposed API changes (does not compile) for, and the other one inner changes for those interested in the implementation details. The second patch contains a new test named TestFieldsUpdates which currently fails. Incremental Field Updates through Stacked Segments -- Key: LUCENE-4258 URL: https://issues.apache.org/jira/browse/LUCENE-4258 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Sivan Yogev Attachments: IncrementalFieldUpdates.odp, LUCENE-4258-API-changes.patch, LUCENE-4258-inner-changes.patch Original Estimate: 2,520h Remaining Estimate: 2,520h Shai and I would like to start working on the proposal to Incremental Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452920#comment-13452920 ] Shai Erera commented on LUCENE-4369: bq. So how about ExactMatchField? +1 for that. I was actually going to propose MatchExactField, but I don't mind the order of the words. Also, since a way to search for these terms/fields using the regular query syntax would be through a PerFieldAnalyzerWrapper and assigning KeywordAnalyzer to that field (are there other ways), we can also call it KeywordField. I don't like MatchOnlyField .. i.e. TextField also matches *only* the words that are indexed in that field. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452922#comment-13452922 ] Robert Muir commented on LUCENE-4369: - I like ExactMatchField too. I thought about Keyword too, but my concern is that this would get confused with 'search keywords' such as the type used in META section of html documents. We could argue about the best field type for that :) but I don't think this is it. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452924#comment-13452924 ] Chris Male commented on LUCENE-4369: I like ExactMatchField, good suggestion. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4258) Incremental Field Updates through Stacked Segments
[ https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452928#comment-13452928 ] Sivan Yogev commented on LUCENE-4258: - Forgot to mention that the implementation patch still missing many components... Incremental Field Updates through Stacked Segments -- Key: LUCENE-4258 URL: https://issues.apache.org/jira/browse/LUCENE-4258 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Sivan Yogev Attachments: IncrementalFieldUpdates.odp, LUCENE-4258-API-changes.patch, LUCENE-4258-inner-changes.patch Original Estimate: 2,520h Remaining Estimate: 2,520h Shai and I would like to start working on the proposal to Incremental Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452933#comment-13452933 ] Uwe Schindler commented on LUCENE-4369: --- ExactMatchField sounds ok, but I don't really like it. On the other hand, we already had Field.KEYWORD(...) static factory in Lucene 1.x (amybe also early 2.x), and that was always fine to me. The term Keyword is only misleading (for my german, library background - Schlagwörter in GER) to me, so I would like to have a good term that tells the user this is a field thats taken as-is). In general I also dont really like the name KeywordTokenizer or KeywordAnalyzer, too, but thats given since long time - so coming from this name, KeywordTokenizer - KeywordField might be a good idea (like NumericTokenStream - NumericField), but The problem with ExactMatch field is: If it is also stored, the name is misleasing again, so KeywordField is better. If we would 100% differentiate between stored and indexed fields while indexing (requiring that the field is also added 2 times, one time as indexed and one time as indexed), I would be fine with MatchOnlyField and StoredStringField. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452941#comment-13452941 ] Uwe Schindler commented on LUCENE-4369: --- Here the good old Lucene 1.9.1 API: http://memex.dsic.upv.es/pbs/Practicas/Lucene/api-1.9.1/org/apache/lucene/document/Field.html (see Field.Keyword, Field.Text, Field.Unstored) StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452945#comment-13452945 ] Uwe Schindler edited comment on LUCENE-4369 at 9/11/12 10:48 PM: - bq. We don't need to rush it, I think its fairly contained to change, we don't even have to deal with this for 4.0 if we aren't happy: we can deprecate StringField just have it extend XXXField in a future 4.x release too. I am against this, we should change this before Lucene 4.0. We have seen already on user list that many people understand it wrong, so for me this issue is a Blocker for 4.0. was (Author: thetaphi): bq. We don't need to rush it, I think its fairly contained to change, we don't even have to deal with this for 4.0 if we aren't happy: we can deprecate StringField just have it extend XXXField in a future 4.x release too. I am against this, we should change this before Lucene 4.0. We have seen already on user list that many people understand it wrong, so for me this issue is a Blocker for 4.0. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452945#comment-13452945 ] Uwe Schindler commented on LUCENE-4369: --- bq. We don't need to rush it, I think its fairly contained to change, we don't even have to deal with this for 4.0 if we aren't happy: we can deprecate StringField just have it extend XXXField in a future 4.x release too. I am against this, we should change this before Lucene 4.0. We have seen already on user list that many people understand it wrong, so for me this issue is a Blocker for 4.0. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452947#comment-13452947 ] Robert Muir commented on LUCENE-4369: - {quote} The problem with ExactMatch field is: If it is also stored, the name is misleasing again, so KeywordField is better. {quote} I dont understand how storing is related. storing is the same always. {quote} If we would 100% differentiate between stored and indexed fields while indexing (requiring that the field is also added 2 times, one time as indexed and one time as indexed), I would be fine with MatchOnlyField and StoredStringField. {quote} In my opinion the only thing worse we could do to our .document API than StringField would be to require the user to add the field twice. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452956#comment-13452956 ] Uwe Schindler commented on LUCENE-4369: --- The names ExactMatchField or MatchOnlyField both have the problem, that they only refer to the indexing side. I would be fine with that name, if it would be unstored by default, so you have to turn on storing explicit. If it is automatically stored, people will complain that their index has too many useless garbage, because they expected a ExactMatchField to be used only for matching, so storing is wrong. I would prefer: UntokenizedField or UntokenizedStringField StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452957#comment-13452957 ] Robert Muir commented on LUCENE-4369: - {quote} I am against this, we should change this before Lucene 4.0. We have seen already on user list that many people understand it wrong, so for me this issue is a Blocker for 4.0. {quote} I disagree with this. I've watched NOT_ANALYZED pop up on the user list for older releases time after time, its frustrating, but this problem is nothing new. Its not introduced with 4.0: I opened this issue because I thought was useful feedback from someone testing the Lucene 4.0 BETA and its really trivial to fix, once we settle on a name. I don't think we should try to block releases when nobody can even agree on a good name yet. We should instead focus on picking a good name: we can implement this for 4.1 or 5.0 or whatever. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3824) Velocity: Error messages from search not displayed
Jan Høydahl created SOLR-3824: - Summary: Velocity: Error messages from search not displayed Key: SOLR-3824 URL: https://issues.apache.org/jira/browse/SOLR-3824 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Fix For: 4.1, 5.0 Error messages are not displayed in Solritas GUI. Example: In SolrCloud mode I have two shards, but shut down shard B. Then there is an error message: {code} lst name=error str name=msgno servers hosting shard:/str int name=code503/int /lst {code} However this is not displayed by Velocity template, it shows an empty search result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452960#comment-13452960 ] Robert Muir commented on LUCENE-4369: - {quote} The names ExactMatchField or MatchOnlyField both have the problem, that they only refer to the indexing side. {quote} I dont know, I actually like ExactMatchField the best because it specifies exactly what I want it to specify. MatchOnly is not as good because you can actually do things like sort (the javadocs mention this as one reason you would use this field type), but ExactMatch just refers to the search behavior, which is what I am really concerned about. It doesn't imply you cannot store it, it just tells you how the search behavior behaves. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4345) Create a Classification module
[ https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452980#comment-13452980 ] Tommaso Teofili commented on LUCENE-4345: - Thanks Lance for your useful insights, I'll definitely have a look :) . bq. If you use index data which is already analyzed with the same analyzer as your test (unseen) documents, you can use a lot more documents as input. More is better. As the training data increases, signal drives out noise. I agree, we could leverage this for sure. bq. Once you add the ability to store load models, training speed becomes less important. Regarding storing and loading models, the base intuition (at least my intuition :P) in the case of Lucene is that the index itself plays that role. Create a Classification module -- Key: LUCENE-4345 URL: https://issues.apache.org/jira/browse/LUCENE-4345 Project: Lucene - Core Issue Type: New Feature Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, SOLR-3700_2.patch, SOLR-3700.patch Lucene/Solr can host huge sets of documents containing lots of information in fields so that these can be used as training examples (w/ features) in order to very quickly create classifiers algorithms to use on new documents and / or to provide an additional service. So the idea is to create a contrib module (called 'classification') to host a ClassificationComponent that will use already seen data (the indexed documents / fields) to classify new documents / text fragments. The first version will contain a (simplistic) Lucene based Naive Bayes classifier but more implementations should be added in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4345) Create a Classification module
[ https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452987#comment-13452987 ] Tommaso Teofili commented on LUCENE-4345: - by the way, if no one objects I plan to commit this shortly so that we can improve things directly by patching the trunk. Create a Classification module -- Key: LUCENE-4345 URL: https://issues.apache.org/jira/browse/LUCENE-4345 Project: Lucene - Core Issue Type: New Feature Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, SOLR-3700_2.patch, SOLR-3700.patch Lucene/Solr can host huge sets of documents containing lots of information in fields so that these can be used as training examples (w/ features) in order to very quickly create classifiers algorithms to use on new documents and / or to provide an additional service. So the idea is to create a contrib module (called 'classification') to host a ClassificationComponent that will use already seen data (the indexed documents / fields) to classify new documents / text fragments. The first version will contain a (simplistic) Lucene based Naive Bayes classifier but more implementations should be added in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3825) Log document IDs when they are retrieved
Scott Stults created SOLR-3825: -- Summary: Log document IDs when they are retrieved Key: SOLR-3825 URL: https://issues.apache.org/jira/browse/SOLR-3825 Project: Solr Issue Type: Improvement Components: SearchComponents - other Reporter: Scott Stults Priority: Trivial During relevancy tuning it's important to know exactly which documents the client has seen. Right now the only way to get that list is to splice into the HTTP traffic. Preferably the IDs could be logged along with the query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452998#comment-13452998 ] Jack Krupansky commented on LUCENE-4369: I would suggest RawTextField. Or, ExactTextField. Or, UnanalyzedTextField. I mean, text is text to an average user. Generally, people should use TextField for text, but use StringField when they need the exact, raw text as is and without being tokenized or otherwise changed. KeywordTokenizer is confusing since it really is NoTokenizer or ExactTextTokenizer or RawTextTokenizer. Is there currently a wiki page that describes the distinction between match and search? I would not expect an average user to know the distinction. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3825) Log document IDs when they are retrieved
[ https://issues.apache.org/jira/browse/SOLR-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Stults updated SOLR-3825: --- Attachment: SOLR-3825.patch Log document IDs when they are retrieved Key: SOLR-3825 URL: https://issues.apache.org/jira/browse/SOLR-3825 Project: Solr Issue Type: Improvement Components: SearchComponents - other Reporter: Scott Stults Priority: Trivial Attachments: SOLR-3825.patch During relevancy tuning it's important to know exactly which documents the client has seen. Right now the only way to get that list is to splice into the HTTP traffic. Preferably the IDs could be logged along with the query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3824) Velocity: Error messages from search not displayed
[ https://issues.apache.org/jira/browse/SOLR-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3824: -- Attachment: SOLR-3824.patch First patch, showing any error section in a big red box. To test, try e.g. {noformat} http://localhost:8983/solr/collection1/browse?defType=luceneq=%22a {noformat} Velocity: Error messages from search not displayed -- Key: SOLR-3824 URL: https://issues.apache.org/jira/browse/SOLR-3824 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Fix For: 4.1, 5.0 Attachments: SOLR-3824.patch Error messages are not displayed in Solritas GUI. Example: In SolrCloud mode I have two shards, but shut down shard B. Then there is an error message: {code} lst name=error str name=msgno servers hosting shard:/str int name=code503/int /lst {code} However this is not displayed by Velocity template, it shows an empty search result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3824) Velocity: Error messages from search not displayed
[ https://issues.apache.org/jira/browse/SOLR-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3824: -- Fix Version/s: (was: 4.1) 4.0 Nice to include this in 4.0 as well since the likelyhood for errors in a sharded env is larger Velocity: Error messages from search not displayed -- Key: SOLR-3824 URL: https://issues.apache.org/jira/browse/SOLR-3824 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Fix For: 4.0, 5.0 Attachments: SOLR-3824.patch Error messages are not displayed in Solritas GUI. Example: In SolrCloud mode I have two shards, but shut down shard B. Then there is an error message: {code} lst name=error str name=msgno servers hosting shard:/str int name=code503/int /lst {code} However this is not displayed by Velocity template, it shows an empty search result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3824) Velocity: Error messages from search not displayed
[ https://issues.apache.org/jira/browse/SOLR-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-3824. --- Resolution: Fixed Committed to trunk r1383405 and branch_4x r1383412 Velocity: Error messages from search not displayed -- Key: SOLR-3824 URL: https://issues.apache.org/jira/browse/SOLR-3824 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Fix For: 4.0, 5.0 Attachments: SOLR-3824.patch Error messages are not displayed in Solritas GUI. Example: In SolrCloud mode I have two shards, but shut down shard B. Then there is an error message: {code} lst name=error str name=msgno servers hosting shard:/str int name=code503/int /lst {code} However this is not displayed by Velocity template, it shows an empty search result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3243) eDismax and non-fielded range query
[ https://issues.apache.org/jira/browse/SOLR-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453021#comment-13453021 ] Jan Høydahl commented on SOLR-3243: --- Bill Bell, would you care to test this and comment. I think there is still a loophole for a bare * query - it gets expanded across all fields as well and is less efficient than a MatchAllDocsQuery, which is more likely to be the intent when issuing a *. Perhaps we can incorporate that in this issue as well? eDismax and non-fielded range query --- Key: SOLR-3243 URL: https://issues.apache.org/jira/browse/SOLR-3243 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5 Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Critical Fix For: 5.0 Attachments: SOLR-3243.patch Reported by Bill Bell in SOLR-3085: If you enter a non-fielded open-ended range in the search box, like [* TO *], eDismax will expand it to all fields: {noformat} +DisjunctionMaxQuery((content:[* TO *]^2.0 | id:[* TO *]^50.0 | author:[* TO *]^15.0 | meta:[* TO *]^10.0 | name:[* TO *]^20.0)) {noformat} This does not make sense, and a side effect is that range queries for strings are very expensive, open-ended even more, and you can totally crash the search server by hammering something like ([* TO *] OR [* TO *] OR [* TO *]) a few times... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453022#comment-13453022 ] Uwe Schindler commented on LUCENE-4369: --- Thanks Jack, that exactly also my opinion, we just need good names. I like your's, too. Raw is a good term, too. The MatchOnly or ExactMatch terms are in my opinion not very good, sorry. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453036#comment-13453036 ] Robert Muir commented on LUCENE-4369: - {quote} Raw is a good term, too. {quote} +1, lets think about that. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4258) Incremental Field Updates through Stacked Segments
[ https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453050#comment-13453050 ] Adrien Grand commented on LUCENE-4258: -- On slide 4 one of the enumerated operations is field deletion but I am not sure how to do it with the proposed API on slide 5? It is just a tought, but your work plan only mentions Lucene fields. Wouldn't it be easier to start working with DocValues? I guess it would help us get started with document updates and would already solve most use-cases (I'm especially thinking of scoring factors). Incremental Field Updates through Stacked Segments -- Key: LUCENE-4258 URL: https://issues.apache.org/jira/browse/LUCENE-4258 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Sivan Yogev Attachments: IncrementalFieldUpdates.odp, LUCENE-4258-API-changes.patch, LUCENE-4258-inner-changes.patch Original Estimate: 2,520h Remaining Estimate: 2,520h Shai and I would like to start working on the proposal to Incremental Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453066#comment-13453066 ] Steven Rowe commented on LUCENE-4369: - AuNaturelTextField StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Collator-based facet sorting in Solr
Claudio Ranieri and I briefly discussed collator based sorting for facets in the thread Problem with accented words sorting on the solr-user mailing list. Here's the idea: Solr faceting supports sorting by either count or index order. Claudio and I both need the order to be collator-based. My understanding of the issue is that it is not currently possible. Collator-based document sorting in Solr uses CollationKeys as field values. This does not work with faceting on fields with multiple values as there is no mapping from the key to the human readable value. ICU sort keys are always null (00) terminated and when two keys are compared, the comparison stops as soon as null is reached(?) http://userguide.icu-project.org/collation/architecture If we concatenate the keys with the original values: key00original valueoffset of original value we get an entity where the ordering is still correct upon comparison and where the original value can be extracted by using the offset from the last int (or maybe short, to spare 2 bytes) in the BytesRef. If the idea is sound, I'll open a JIRA issue. Unfortunately I do not have time right now for hacking on it. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis
[ https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453070#comment-13453070 ] Michael McCandless commented on LUCENE-4355: +1, looks great! improve AtomicReader sugar apis --- Key: LUCENE-4355 URL: https://issues.apache.org/jira/browse/LUCENE-4355 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Fix For: 5.0, 4.0 Attachments: LUCENE-4355.patch, LUCENE-4355.patch I thought about this after looking @ LUCENE-4353: AtomicReader has some sugar APIs that are over top of the flex apis (Fields, Terms, ...). But these might be a little trappy/confusing compared to 3.x. # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and .termPositionsEnum(Bits, ...). I also don't think we need variants that take flags here. We should simplify these to be less trappy. I think we only need (String, BytesRef) here. # This means you need to use the flex apis for more expert usage: but we make this a bit too hard since we only let you get a Terms (which you must null check, then call .iterator() on, then seekExact, ...). I think it could help if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x had a method that let you get a 'positioned termsenum'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Collator-based facet sorting in Solr
On Tue, Sep 11, 2012 at 10:43 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: ICU sort keys are always null (00) terminated and when two keys are compared, the comparison stops as soon as null is reached(?) http://userguide.icu-project.org/collation/architecture If we concatenate the keys with the original values: key00original valueoffset of original value we get an entity where the ordering is still correct upon comparison and where the original value can be extracted by using the offset from the last int (or maybe short, to spare 2 bytes) in the BytesRef. I think the idea is sound, but I don't think we need the offset? I'm fairly positive ICU collation keys explicitly avoid 0 bytes except for the null terminator. So the original value can be extracted after the fact just by looking for the terminator... such a thing could even be done client-side and i dont think we need the offset for speed either, because its something you would do before final display. we need to verify what I'm saying is true about avoiding 0 bytes, I'll look into it. Of course such an option is only useful for the new ICUCollationAnalyzer (solr's ICUCollationField uses that) because the older deprecated filters are encoded in a different way: I think we should leave those alone. -- lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Collator-based facet sorting in Solr
Just a concern where things could act a little funky: today for example, If I set strength=primary, then its going to fold Test and test to the same unique term, but under this scheme you would have bytesTest and bytestest as two terms. this could be undesirable in the typical case that you just want case-insensitive facets: but we don't provide any way to preprocess the text to avoid this. Really a lot of this is because factory-based analysis chains have no way to specify the AttributeFactory, e.g. i guess if we really wanted to fix this right we would need to pass in the AttributeFactory to TokenizerFactory's create() method. But for now from Solr it would be a little hacky, e.g. someone is gonna have to fold the case client-side or whatever if they don't want these problems. On Tue, Sep 11, 2012 at 10:43 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Claudio Ranieri and I briefly discussed collator based sorting for facets in the thread Problem with accented words sorting on the solr-user mailing list. Here's the idea: Solr faceting supports sorting by either count or index order. Claudio and I both need the order to be collator-based. My understanding of the issue is that it is not currently possible. Collator-based document sorting in Solr uses CollationKeys as field values. This does not work with faceting on fields with multiple values as there is no mapping from the key to the human readable value. ICU sort keys are always null (00) terminated and when two keys are compared, the comparison stops as soon as null is reached(?) http://userguide.icu-project.org/collation/architecture If we concatenate the keys with the original values: key00original valueoffset of original value we get an entity where the ordering is still correct upon comparison and where the original value can be extracted by using the offset from the last int (or maybe short, to spare 2 bytes) in the BytesRef. If the idea is sound, I'll open a JIRA issue. Unfortunately I do not have time right now for hacking on it. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3891) Documents loaded at search time (IndexReader.document) should be a different class from the index-time Document
[ https://issues.apache.org/jira/browse/LUCENE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3891. Resolution: Duplicate Fix Version/s: (was: 4.1) 5.0 Fixed in LUCENE-3312. Documents loaded at search time (IndexReader.document) should be a different class from the index-time Document --- Key: LUCENE-3891 URL: https://issues.apache.org/jira/browse/LUCENE-3891 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 5.0 The fact that the Document you can load at search time is the same Document class you had indexed is horribly trappy in Lucene, because, the loaded document necessarily loses information like field boost, whether a field was tokenized, etc. (See LUCENE-3854 for a recent example). We should fix this, statically, so that it's an entirely different class at search time vs index time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2915) make CoreCodecProvider convenience class so apps can easily pick per-field codecs
[ https://issues.apache.org/jira/browse/LUCENE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2915. Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 PerFieldPostingsFormat solved this. make CoreCodecProvider convenience class so apps can easily pick per-field codecs - Key: LUCENE-2915 URL: https://issues.apache.org/jira/browse/LUCENE-2915 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LUCENE-2915.patch We already have DefaultCodecProvider, which simply registers all core codecs and uses Standard for all fields, but it's package private. We should make this public, and name it CoreCodecProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2807) Improve test debuggability through ant
[ https://issues.apache.org/jira/browse/LUCENE-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2807. Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 Dawid already fixed these issues (thanks!). Improve test debuggability through ant -- Key: LUCENE-2807 URL: https://issues.apache.org/jira/browse/LUCENE-2807 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Michael McCandless Fix For: 4.0 Some small improvements would go a long ways... When trying to debug an intermittent fail, I usually run w/ -Dtests.verbose=true and w/ many iters. But because the formatter buffers this can hit OOME, so maybe we make an unbuffered formatter. Also, it'd be nice if we could have the formatter discard output for a given iter if there was no failure, and I think the iters should stop as soon as a failure is hit. Maybe somehow we make a new tests.mode that would switch on these behaviours? Unbuffered formatter is also vital when debugging a deadlock... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3825) Log document IDs when they are retrieved
[ https://issues.apache.org/jira/browse/SOLR-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-3825: - Assignee: Grant Ingersoll Log document IDs when they are retrieved Key: SOLR-3825 URL: https://issues.apache.org/jira/browse/SOLR-3825 Project: Solr Issue Type: Improvement Components: SearchComponents - other Reporter: Scott Stults Assignee: Grant Ingersoll Priority: Trivial Attachments: SOLR-3825.patch During relevancy tuning it's important to know exactly which documents the client has seen. Right now the only way to get that list is to splice into the HTTP traffic. Preferably the IDs could be logged along with the query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4
[ https://issues.apache.org/jira/browse/LUCENE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453133#comment-13453133 ] David Smiley commented on LUCENE-4197: -- committed removal of PrefixCellsTokenizer Small improvements to Lucene Spatial Module for v4 -- Key: LUCENE-4197 URL: https://issues.apache.org/jira/browse/LUCENE-4197 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Fix For: 4.0 Attachments: LUCENE-4197_rename_CachedDistanceValueSource.patch, LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch, SpatialArgs-_remove_unused_min_and_max_params.patch This issue is to capture small changes to the Lucene spatial module that don't deserve their own issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-3823. -- Resolution: Invalid The error is because of the colon character, it has meaning in a query and must be escaped. See: http://lucene.apache.org/core/3_6_1/queryparsersyntax.html. So I'll close this as invalid, if you disagree please let us know. BTW, it's better to raise this kind of question on the user's list rather than open a JIRA, at least until you're sure it's really a bug. Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): ? defType = dismax q = foo bar bq = (*:* -xxx)^999 You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis
[ https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453137#comment-13453137 ] Robert Muir commented on LUCENE-4355: - Thanks Mike: Ill give some time in case anyone else wants to review, but i'd like to commit this in a day or two. improve AtomicReader sugar apis --- Key: LUCENE-4355 URL: https://issues.apache.org/jira/browse/LUCENE-4355 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Fix For: 5.0, 4.0 Attachments: LUCENE-4355.patch, LUCENE-4355.patch I thought about this after looking @ LUCENE-4353: AtomicReader has some sugar APIs that are over top of the flex apis (Fields, Terms, ...). But these might be a little trappy/confusing compared to 3.x. # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and .termPositionsEnum(Bits, ...). I also don't think we need variants that take flags here. We should simplify these to be less trappy. I think we only need (String, BytesRef) here. # This means you need to use the flex apis for more expert usage: but we make this a bit too hard since we only let you get a Terms (which you must null check, then call .iterator() on, then seekExact, ...). I think it could help if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x had a method that let you get a 'positioned termsenum'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2723) Speed up Lucene's low level bulk postings read API
[ https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2723. Resolution: Won't Fix We nuked the low level bulk postings API... and BlockPostingsFormat now does bulk reads under the hood and gives great performance ... Speed up Lucene's low level bulk postings read API -- Key: LUCENE-2723 URL: https://issues.apache.org/jira/browse/LUCENE-2723 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.1 Attachments: LUCENE-2723-BulkEnumWrapper.patch, LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723_termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723_wastedint.patch Spinoff from LUCENE-1410. The flex DocsEnum has a simple bulk-read API that reads the next chunk of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR (from LUCENE-1410). This is not unlike sucking coffee through those tiny plastic coffee stirrers they hand out airplanes that, surprisingly, also happen to function as a straw. As a result we see no perf gain from using FOR/PFOR. I had hacked up a fix for this, described at in my blog post at http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html I'm opening this issue to get that work to a committable point. So... I've worked out a new bulk-read API to address performance bottleneck. It has some big changes over the current bulk-read API: * You can now also bulk-read positions (but not payloads), but, I have yet to cutover positional queries. * The buffer contains doc deltas, not absolute values, for docIDs and positions (freqs are absolute). * Deleted docs are not filtered out. * The doc freq buffers need not be aligned. For fixed intblock codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16, Group varint, etc.) they won't be. It's still a work in progress... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2301) search for fix all TODO 4.0 comments before releasing 4.0
[ https://issues.apache.org/jira/browse/LUCENE-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2301. Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 5.0 I can't find any more TODO 4.0s ... lots of generic TODOs though :) search for fix all TODO 4.0 comments before releasing 4.0 - Key: LUCENE-2301 URL: https://issues.apache.org/jira/browse/LUCENE-2301 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Michael McCandless Priority: Minor Fix For: 5.0, 4.0 Let's try to use the specific string?: {code} TODO 4.0 {code} to mark any place where we must do something for 4.0? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453151#comment-13453151 ] Erick Erickson commented on LUCENE-4369: Anything with Raw is good. The problem with Keyword or Untokenized or Unanalyzed in the name is that it rather assumes that the user is familiar with what those terms mean in Lucene. If they're experienced enough to understand _that_, they're less likely to fall into this error in the first place. We could do something that removes it from consideration unless people dig. I understand it's a general field, but how about something like Identifier (I'm not too keen on that name actually). I'm reaching for something that is naturally thought of as a type suitable for uniqueKey fields but requires one to dig a bit before using it for other fields. OK, an idea out of left field, why do we have a string as a type anyway? Does it make any sense to just remove it and have people use KeywordTokenizer when they want this behavior? I'm ready for _this_ idea to be shot down in flames G I suppose in the Solr world, we could just remove the string type from schema.xml and provide an example fieldType that was only KeyworTokenized and avoid a world of confusion for many users. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453159#comment-13453159 ] Robert Muir commented on LUCENE-4369: - {quote} OK, an idea out of left field, why do we have a string as a type anyway? Does it make any sense to just remove it and have people use KeywordTokenizer when they want this behavior? I'm ready for this idea to be shot down in flames G {quote} I've said the same thing before, but I figure I won't get consensus for that. I'm happy to just get the name to be anything but String for now :) Its still screwed up there are things like setBoost() at all on StringField when it omits norms etc, and screwed up that it bypasses the Analyzer (the classic NOT_ANALYZED problem), but fixing the name would at least help. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2163) Remove synchronized from DirReader.reopen/clone
[ https://issues.apache.org/jira/browse/LUCENE-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2163: --- Attachment: LUCENE-2163.patch We've made awesome progress removing sync'd, now that SR is read-only ... but I found a few remaining sync'd in StandardDirectoryReader that I *think* are not necessary? Eg doClose is already protected by IR.close (only one thread will decRef to RC=0). And for doOpenIfChanged/noWriter... why do they need to be sync'd? If it's solely to prevent strange exceptions when one thread is closing while another is reopening ... I don't think we need to do that (it's best effort, and I think likely you'd get ACE anyway since we'd try to incRef an already-closed SR)? But then again I suppose the sync'd are not really hurting anything (it won't block searches since nothing else is sync'd...). Still it's nice to remove them if we can, in case something on the search path does become sync'd at some point ... Remove synchronized from DirReader.reopen/clone --- Key: LUCENE-2163 URL: https://issues.apache.org/jira/browse/LUCENE-2163 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Priority: Minor Fix For: 4.1 Attachments: LUCENE-2163.patch Spinoff from LUCENE-2161, where the fact that DirReader.reopen is sync'd was dangerous in the context of NRT (could block all searches against that reader when CMS was throttling). So, with LUCENE-2161, we're removing the synchronization when it's an NRT reader that you're reopening. But... why should we sync even for a normal reopen? There are various sync'd methods on IndexReader/DirReader (we are reducing that, with LUCENE-2161 and also LUCENE-2156), but, in general it doesn't seem like normal reopen really needs to be sync'd. Performing a reopen shouldn't incur any chance of blocking a search... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2143) Understand why NRT performance is affected by flush frequency
[ https://issues.apache.org/jira/browse/LUCENE-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2143. Resolution: Not A Problem This is a hotspot issue ... not much we can do about it. Understand why NRT performance is affected by flush frequency - Key: LUCENE-2143 URL: https://issues.apache.org/jira/browse/LUCENE-2143 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.1 Attachments: SearchTest.java In LUCENE-2061 (perf tests for NRT), I test NRT performance by first getting a baseline QPS with only searching, using enough threads to saturate. Then, I pick an indexing rate (I used 100 docs/sec), and index docs at that rate, and I also reopen a NRT reader at different frequencies (10/sec, 1/sec, every 5 seconds, etc.), and then again test QPS (saturated). I think this is a good approach for testing NRT -- apps can see, as a function of freshness and at a fixed indexing rate, what the cost is to QPS. You'd expect as index rate goes up, and freshness goes up, QPS will go down. But I found something very strange: the low frequency reopen rates often caused a highish hit to QPS. When I forced IW to flush every 100 docs (= once per second), the performance was generally much better. I actually would've expected the reverse -- flushing in batch ought to use fewer resoruces. One theory is something odd about my test env (based on OpenSolaris), so I'd like to retest on a more mainstream OS. I'm opening this issue to get to the bottom of it... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2093) Use query-private scope instead of shared Term-TermInfo cache
[ https://issues.apache.org/jira/browse/LUCENE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2093. Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 5.0 We've improved queries so they now save their own term state during rewrite and re-use it during matching. Use query-private scope instead of shared Term-TermInfo cache -- Key: LUCENE-2093 URL: https://issues.apache.org/jira/browse/LUCENE-2093 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Michael McCandless Priority: Minor Fix For: 5.0, 4.0 Spinoff of LUCENE-2075. We currently use a shared terms cache so multiple resolves of the same term within execution of a single query save CPU. But this ties up a good amount of long term RAM... So, it might be better to instead create a query private scope, where places in Lucene like the terms dict could store retrieve results. The scope would be private to each running query, and would be GCable as soon as the query completes. Then we've have perfect within query hit rate... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453174#comment-13453174 ] Michael McCandless commented on LUCENE-4369: I think it's useful to have a dedicated sugar field for things like primary keys, URLs, enumerated fields (country, state, zip code), entitlements fields (ACLs), tags, etc., and when users do this directly today I suspect they often forget to disable norms and index with docs-only. But I agree the name is trappy now. +1 for ExactTextField. I don't really like raw: it sounds too ... low level. Like it's not even gonna be indexed or something. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2163) Remove synchronized from DirReader.reopen/clone
[ https://issues.apache.org/jira/browse/LUCENE-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453180#comment-13453180 ] Uwe Schindler commented on LUCENE-2163: --- If doClose is protected by close() it is a no-op, so it does not matter if there is a sync or not :-) The other ones I already wanted to remove while refactoring, I just missed to do it. I think this issue is a relict from earlier times... I would just commit that removal. If you sync on reopen, you must sync everything. Remove synchronized from DirReader.reopen/clone --- Key: LUCENE-2163 URL: https://issues.apache.org/jira/browse/LUCENE-2163 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Priority: Minor Fix For: 4.1 Attachments: LUCENE-2163.patch Spinoff from LUCENE-2161, where the fact that DirReader.reopen is sync'd was dangerous in the context of NRT (could block all searches against that reader when CMS was throttling). So, with LUCENE-2161, we're removing the synchronization when it's an NRT reader that you're reopening. But... why should we sync even for a normal reopen? There are various sync'd methods on IndexReader/DirReader (we are reducing that, with LUCENE-2161 and also LUCENE-2156), but, in general it doesn't seem like normal reopen really needs to be sync'd. Performing a reopen shouldn't incur any chance of blocking a search... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453199#comment-13453199 ] Robert Muir commented on LUCENE-2684: - {quote} It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... {quote} Well you are using a custom collector anyway if you are doing this, so can't we just add a sentence to that method's javadocs indicating that you should return false if you want to use the scorer navigation apis? it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4345) Create a Classification module
[ https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453208#comment-13453208 ] Robert Muir commented on LUCENE-4345: - Can we remove the ClassificationException? It only seems to box IOException... we can just throw IOException directly instead? Create a Classification module -- Key: LUCENE-4345 URL: https://issues.apache.org/jira/browse/LUCENE-4345 Project: Lucene - Core Issue Type: New Feature Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, SOLR-3700_2.patch, SOLR-3700.patch Lucene/Solr can host huge sets of documents containing lots of information in fields so that these can be used as training examples (w/ features) in order to very quickly create classifiers algorithms to use on new documents and / or to provide an additional service. So the idea is to create a contrib module (called 'classification') to host a ClassificationComponent that will use already seen data (the indexed documents / fields) to classify new documents / text fragments. The first version will contain a (simplistic) Lucene based Naive Bayes classifier but more implementations should be added in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4373) BBoxStrategy should support query shapes of any type
David Smiley created LUCENE-4373: Summary: BBoxStrategy should support query shapes of any type Key: LUCENE-4373 URL: https://issues.apache.org/jira/browse/LUCENE-4373 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Priority: Minor It's great that BBoxStrategy has sophisticated shape area similarity based on bounding box, but I think that doesn't have to preclude having a non-rectangular query shape. The bbox to bbox query implemented already is probably pretty pretty fast as can work by numeric range queries, but I'd like this to be the first stage of which the 2nd is a FieldCache based comparison to the query shape if it's not a rectangle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453218#comment-13453218 ] Uwe Schindler commented on LUCENE-2684: --- I think this issue is fixed already? VisitSubScorers works in 3.6.2 (if it gets released, Robert backported) and in 4.0 its working, too? As you need a custom collector anyway to make use of Scorer.getChildren(), we should maybe make BS1 throw UOE on getChildren() in 4.0 (explaining that you need inOrder) and visitSubScorers in 3.6.2? it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453223#comment-13453223 ] Robert Muir commented on LUCENE-2684: - {quote} As you need a custom collector anyway to make use of Scorer.getChildren(), we should maybe make BS1 throw UOE on getChildren() in 4.0 (explaining that you need inOrder) and visitSubScorers in 3.6.2? {quote} +1, i think for freq() and getChildren() we should throw UOE with text like this. But we can also do the javadocs too. Then i think there would be a lot less surprises. it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4374) Spatial- rename vector.TwoDoublesStrategy to vector.PointVectorStrategy
David Smiley created LUCENE-4374: Summary: Spatial- rename vector.TwoDoublesStrategy to vector.PointVectorStrategy Key: LUCENE-4374 URL: https://issues.apache.org/jira/browse/LUCENE-4374 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Fix For: 4.0 TwoDoubles isn't necessarily appropriate since it could be two floats, once it is enhanced to make that configurable. I like PointVector because it's clear it indexes points. Eventually I could imagine a CircleVectorStrategy in the same package. This does suggest BBoxStrategy should be RectVectorStrategy in the vector package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3826) Allow unit test classes to specify core name during setup
Amit Nithian created SOLR-3826: -- Summary: Allow unit test classes to specify core name during setup Key: SOLR-3826 URL: https://issues.apache.org/jira/browse/SOLR-3826 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 4.0-BETA Reporter: Amit Nithian Priority: Minor Fix For: 4.0 When creating a unit test extending SolrTestCaseJ4, the corename is forced to collection1 which can be problematic if you want to do unit tests relying on schema/solrconfig specific to a core. Rather than hard-coding to collection1 allow this to be specified in the initCore method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3826) Allow unit test classes to specify core name during setup
[ https://issues.apache.org/jira/browse/SOLR-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Nithian updated SOLR-3826: --- Attachment: SOLR-3826.patch Simple patch demonstrating what I am talking about. Hopefully not too problematic :-) Allow unit test classes to specify core name during setup - Key: SOLR-3826 URL: https://issues.apache.org/jira/browse/SOLR-3826 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 4.0-BETA Reporter: Amit Nithian Priority: Minor Fix For: 4.0 Attachments: SOLR-3826.patch When creating a unit test extending SolrTestCaseJ4, the corename is forced to collection1 which can be problematic if you want to do unit tests relying on schema/solrconfig specific to a core. Rather than hard-coding to collection1 allow this to be specified in the initCore method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453238#comment-13453238 ] Michael McCandless commented on LUCENE-2684: +1 But we should word it as a workaround ... ie, it's sort of strange that returning false from this unrelated method means suddenly scorer.freq() works: that's really an implementation detail. EG someday we could make BS1 score docs in order (it is possible, just not sure it'd be performant), and then this workaround no longer works. it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453245#comment-13453245 ] Uwe Schindler commented on LUCENE-2684: --- It does not only affect score(). In my case it was retrieving the subquery score... bq. EG someday we could make BS1 score docs in order (it is possible, just not sure it'd be performant), and then this workaround no longer works. But with in-order scoring we are in all cases use correctly positioned scorers, otherwise it is a bug (like the DisjunctionSumScorer bug in 3.6 and 4.0 we fixed recently). So returning false works around the issue currently, but it would not hurt if somebody would return false, although our new BS1 can handle in order. But on the other hand, if BS1 would score in order, but not position sub-scorers correctly it is clearly a bug! it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453246#comment-13453246 ] Robert Muir commented on LUCENE-2684: - {quote} But we should word it as a workaround ... ie, it's sort of strange that returning false from this unrelated method means suddenly scorer.freq() works: that's really an implementation detail. EG someday we could make BS1 score docs in order (it is possible, just not sure it'd be performant), and then this workaround no longer works. {quote} I don't agree: the strangeness is the two booleans toplevelScorer and scoreDocsInOrder. If we wanted to do this in the future, we could just rename scoreDocsInOrder to needsNavigation. Or we could just fold both the booleans into 'BS1 is ok' ... are they used anywhere else? :) it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453245#comment-13453245 ] Uwe Schindler edited comment on LUCENE-2684 at 9/12/12 5:20 AM: It does not only affect freq(). In my case it was retrieving the subquery score... bq. EG someday we could make BS1 score docs in order (it is possible, just not sure it'd be performant), and then this workaround no longer works. But with in-order scoring we are in all cases use correctly positioned scorers, otherwise it is a bug (like the DisjunctionSumScorer bug in 3.6 and 4.0 we fixed recently). So returning false works around the issue currently, but it would not hurt if somebody would return false, although our new BS1 can handle in order. But on the other hand, if BS1 would score in order, but not position sub-scorers correctly it is clearly a bug! was (Author: thetaphi): It does not only affect score(). In my case it was retrieving the subquery score... bq. EG someday we could make BS1 score docs in order (it is possible, just not sure it'd be performant), and then this workaround no longer works. But with in-order scoring we are in all cases use correctly positioned scorers, otherwise it is a bug (like the DisjunctionSumScorer bug in 3.6 and 4.0 we fixed recently). So returning false works around the issue currently, but it would not hurt if somebody would return false, although our new BS1 can handle in order. But on the other hand, if BS1 would score in order, but not position sub-scorers correctly it is clearly a bug! it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3628) SolrDocument uses user-provided collections unsafely
[ https://issues.apache.org/jira/browse/SOLR-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-3628: --- Attachment: SOLR-3628.patch updated patch to include a test that the field really is backed by the collection (since we now explicitly document it) will commit a soon as full test run finishes SolrDocument uses user-provided collections unsafely Key: SOLR-3628 URL: https://issues.apache.org/jira/browse/SOLR-3628 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.6, 4.0-ALPHA Environment: Mac OS X 10.7.4, Java 6 Reporter: Tom Switzer Assignee: Hoss Man Fix For: 4.0 Attachments: SOLR-3628.patch, solrdoc-ro-list-bug-comp.patch, solrdoc-ro-list-bug.patch Adding a RO Collection as the value of a field (ie. SolrDocument or SolrInputField) will result in an UnsupportedOperationException later on when adding more values to that field. This happens because no defensive copy of collections are made. Instead, if a collection is given first, then it becomes the backing collection for the field. This can cause problems if the collection is modified after the fact or if a read-only collection is given (eg. Collection.unmodifiableList(...)). It can be reproduced with: SolrDocument doc = new SolrDocument() doc.addField(v, Collections.unmodifiableList(new ArrayListObject())) doc.addField(v, a) I've created a patch that includes a fix and a test with, essentially, the above. The patch just ensures that SolrDocument and SolrInputField always use a Collection they created as the value, rather than relying on what was given to them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453259#comment-13453259 ] Uwe Schindler commented on LUCENE-2684: --- An idea (separate issue!) would be: BS1 completely violates the scorer interface, the only method you can call is the one taking a Collector. In my opinion, BS1 should *not* implement the Scorer interface, that the whole bug! It should maybe some separate class like OutOfOrderDocIdReporter (name is just an example) that only implements collect(Collector). And the navigation api (advance, next) should be separated from score() and freq() - a simple java interface Scorer. So the current in-order scorer would be a simple DocIdSetIterator that additionally implements the Scorer interface (to provide score() and freq()) and current out-of-order scorers would implement only the OutOfOrderDocIdReporter API and pass a inlined Scorer interface (without advance and next) to the setScorer() method (like BucketScorer currently). it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453261#comment-13453261 ] Robert Muir commented on LUCENE-2684: - Collectible... (not serious) it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453262#comment-13453262 ] Michael McCandless commented on LUCENE-2684: The problem is that scoresDocsInOrder doesn't really capture what's necessary here (yes, it works today, but, not necessarily tomorrow). I agree Uwe: if we add a Collector.needsNavigation() then even a fixed BS1 that sorted the docIDs before collection would not be usable since the subs will not be on the doc during collect(). And I agree Robert: the current booleans topLevelScorer and scoreDocsInOrder, and then a new needsNavigation, will make things rather confusing. Really I think topLevelScorer should be strongly typed: the intent is to declare whether you will call Scorer.score(Collector) or whether you will call .nextDoc()/.score() ... they really should be different classes. If we don't think any other future scorer would want to score docs NOT in order ... then maybe we should simple rename scoreDocsInOrder to needsNavigation? (Or scoreDocAtOnce, scoreDocAtATime, something else...). it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453269#comment-13453269 ] Robert Muir commented on LUCENE-2684: - {quote} If we don't think any other future scorer would want to score docs NOT in order ... then maybe we should simple rename scoreDocsInOrder to needsNavigation? (Or scoreDocAtOnce, scoreDocAtATime, something else...). {quote} I actually just remembered the query-time join i think does this too? But yeah, if we are going to have booleans, i would prefer something more along the lines of document-at-a-time since its less confusing than scoreDocsInOrder (its standard IR terminology and less confusing). it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use
[ https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453274#comment-13453274 ] Michael McCandless commented on LUCENE-2684: bq. BS1 completely violates the scorer interface, the only method you can call is the one taking a Collector. In my opinion, BS1 should not implement the Scorer interface, that the whole bug! Well let's remember that the must have doc-at-once scoring, for all subs too is a very rare use-case. The vast majority of users just need a fast .score(Collector) interface. But yeah I agree: it should be strongly typed, and BS1 should only implement the .score(Collector) interface. The ScoresDocAtOnce interface can easily implement the .score(Collector) interface (as Scorer does today...). it's not possible to access sub-query's freq information if BooleanScorer is use Key: LUCENE-2684 URL: https://issues.apache.org/jira/browse/LUCENE-2684 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Michael McCandless Fix For: 4.1 LUCENE-2590 added an advanced feature, allowing an app to gather all sub-scorers for any Query. This is powerful because then, during collection, the app can get some details about how each sub-query participated in the overall match for the given document. However, I think this is completely broken if the BooleanQuery uses BooleanScorer, because that scorer is not doc-at-once. Instead, it batch processes chunks of 2048 sequential docIDs per scorer. This is a big performance gain, but it means that the sub scorers will all be positioned to the end of the 2048 doc chunk while the docs that matched within that chunk are collected. I don't think we can easily fix this... likely the fix is to make it easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)? It is actually possible to force this, today, by having your collector return false from acceptDocsOutOfOrder... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3825) Log document IDs when they are retrieved
[ https://issues.apache.org/jira/browse/SOLR-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453287#comment-13453287 ] Grant Ingersoll commented on SOLR-3825: --- A few comments on the patch: # SolrMBeanTest fails with this patch due to the description and source being null # I don't think we want/need member variables for ids and idScores, as it won't be thread safe. I'd just loop the DocIterator once, building a StringBuilder and then calling addToLog on that StringBuilder. This will also avoid the need for clone() # For the scores, let's just do an output of id:score, id:score, ... Using a Map won't be reliable, as we will want to maintain order in the log. # For the log key, let's just call it the same thing which should simplify parsing, regardless of whether there are scores present or not, so the format would be: responseLog: id1[:score1],id2[:score2],... where [ ] is used to indicate it is optional. # We should follow the normal SearchComponent pattern of being able to turn on/off the component via a request parameter. {code}if (!params.getBool(COMPONENT_NAME, false)) { return; }{code} This component should be OFF by default. # In the ResponseLogComponentTest, do we need the createCore() stuff? See some of the other tests and how they use initCore. Log document IDs when they are retrieved Key: SOLR-3825 URL: https://issues.apache.org/jira/browse/SOLR-3825 Project: Solr Issue Type: Improvement Components: SearchComponents - other Reporter: Scott Stults Assignee: Grant Ingersoll Priority: Trivial Attachments: SOLR-3825.patch During relevancy tuning it's important to know exactly which documents the client has seen. Right now the only way to get that list is to splice into the HTTP traffic. Preferably the IDs could be logged along with the query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3628) SolrDocument uses user-provided collections unsafely
[ https://issues.apache.org/jira/browse/SOLR-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-3628. Resolution: Fixed Fix Version/s: 5.0 Committed revision 1383520. - trunk Committed revision 1383533. - 4x Thanks Tom! SolrDocument uses user-provided collections unsafely Key: SOLR-3628 URL: https://issues.apache.org/jira/browse/SOLR-3628 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.6, 4.0-ALPHA Environment: Mac OS X 10.7.4, Java 6 Reporter: Tom Switzer Assignee: Hoss Man Fix For: 4.0, 5.0 Attachments: SOLR-3628.patch, solrdoc-ro-list-bug-comp.patch, solrdoc-ro-list-bug.patch Adding a RO Collection as the value of a field (ie. SolrDocument or SolrInputField) will result in an UnsupportedOperationException later on when adding more values to that field. This happens because no defensive copy of collections are made. Instead, if a collection is given first, then it becomes the backing collection for the field. This can cause problems if the collection is modified after the fact or if a read-only collection is given (eg. Collection.unmodifiableList(...)). It can be reproduced with: SolrDocument doc = new SolrDocument() doc.addField(v, Collections.unmodifiableList(new ArrayListObject())) doc.addField(v, a) I've created a patch that includes a fix and a test with, essentially, the above. The patch just ensures that SolrDocument and SolrInputField always use a Collection they created as the value, rather than relying on what was given to them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-3823: --- Description: When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... was: When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): ? defType = dismax q = foo bar bq = (*:* -xxx)^999 You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... 1) editing the issue description to include noformat tags -- i think Erick was getting confused by the \*:\* showing up as just : 2) i can't reproduce the described problem. When i tried using the solr example data, this request worked just fine... http://localhost:8983/solr/select?q=ipoddefType=dismaxbq=%28*:*%20-id:IW-02%29^999 Mathos: please follow up on the solr-user@lucene mailing list with more details about the problems you are you having and your actual (specific) configs/queries Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453310#comment-13453310 ] Michael McCandless commented on LUCENE-4371: +1 I think having II implement slice is much cleaner than Directory having to implement createSlicer returning an IndexInputSlicer with only one method. consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys
[ https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4173: - Attachment: LUCENE-4173_remove_ignoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch This patch removes ignoreIncompatibleGeometry and modified the strategies to fail when given a shape that isn't the precise shape used -- no coalescing. BBox TwoDoubles were both doing coalescing (e.g. shape.getBoundingBox()). PrefixTree can handle anything so change there. I'll commit this pending your +1 Chris. An enum for FAIL, COALESCE, or IGNORE can be done in another issue if desired. Remove IgnoreIncompatibleGeometry for SpatialStrategys -- Key: LUCENE-4173 URL: https://issues.apache.org/jira/browse/LUCENE-4173 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Reporter: Chris Male Assignee: David Smiley Attachments: LUCENE-4173.patch, LUCENE-4173_remove_ignoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch Silently not indexing anything for a Shape is not okay. Users should get an Exception and then they can decide how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453314#comment-13453314 ] Mathos Marcer commented on SOLR-3823: - The problem seems to be when I specify defType=edismax, under defType=dismax it is working like a champ. Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453317#comment-13453317 ] Steven Rowe commented on LUCENE-4369: - Serious suggestion: WholeTextField (Following the raw/cooked food metaphor used in various computational contexts - whole food means unprocessed.) I like ExactTextField too, but it's missing the beginning and end anchors: the intent is exactly this search string, but it doesn't necessarily imply and nothing else. E.g. would a user armed only with the name assume that an ExactTextField query string two three would not match an indexed string one two three four? StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453320#comment-13453320 ] Michael McCandless commented on LUCENE-4371: I don't think the default impl (SlicedIndexInput) should overrided BII's copyBytes? Seems ... spooky. consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453322#comment-13453322 ] Robert Muir commented on LUCENE-4371: - I agree Mike, i wanted to remove it... but I'm afraid! I also dont understand why we have DataOutput.copyBytes(DataInput), and also IndexInput.copyBytes(IndexOutput). Is this all really necessary? consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reopened SOLR-3823: -- Thanks, Hoss, you're right... But I can get this to fail both with BETA and today's trunk with the example data. {noformat} http://localhost:8983/solr/select?q=foodefType=edismaxbq=(name:nonsense -xxx)^999 {noformat} Interestingly this works: (note the space after bq), {noformat} http://localhost:8983/solr/select?q=foodefType=edismaxbq =(name:nonsense -xxx)^999 {noformat} This fails (spaces around parens, there was an issue with non-space parens lately, but apparently it's unrelated.) {noformat} http://localhost:8983/solr/select?q=foodefType=edismaxbq= ( name:nonsense -xxx ) ^999 {noformat} Stack trace from log: Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered EOF at line 1, column 1. Was expecting one of: NOT ... + ... - ... BAREOPER ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.lucene.queryparser.classic.QueryParser.generateParseException(QueryParser.java:708) at org.apache.lucene.queryparser.classic.QueryParser.jj_consume_token(QueryParser.java:590) at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:275) at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:181) at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:261) at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:181) at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170) at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:120) ... 35 more Sep 11, 2012 12:37:58 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/select params={q=foodefType=edismaxbq=+(+name:nonsense+-xxx+)+^999} status=400 QTime=2 Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453349#comment-13453349 ] Mathos Marcer commented on SOLR-3823: - I'm glad I'm not just going crazy :-) I did notice while the space before the equal sign (ie bq =(name:nonsense -xxx)^999) doesn't produce a parsing error, comparing results between 3.6 and 4.0 BETA, it doesn't appear to be applying the boost. In fact I get the same results as if I didn't have the bq option there at all. Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453350#comment-13453350 ] Mathos Marcer commented on SOLR-3823: - Actually looking at it closer, it is probably because with adding the space after bq is it doesn't register it as bq but as bq looking at the params section of the query: str name=bq (*:* -replacement)^9950/str Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453350#comment-13453350 ] Mathos Marcer edited comment on SOLR-3823 at 9/12/12 7:06 AM: -- Actually looking at it closer, it is probably because with adding the space after bq is it doesn't register it as bq but as bq looking at the params section of the query: str name=bq (*:* -replacement)^9950/str was (Author: mathos): Actually looking at it closer, it is probably because with adding the space after bq is it doesn't register it as bq but as bq looking at the params section of the query: str name=bq (*:* -replacement)^9950/str Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453350#comment-13453350 ] Mathos Marcer edited comment on SOLR-3823 at 9/12/12 7:08 AM: -- Actually looking at it closer, it is probably because with adding the space after bq is it doesn't register it as bq but as bq looking at the params section of the query: str name=bq (\*:\* -replacement)^9950/str was (Author: mathos): Actually looking at it closer, it is probably because with adding the space after bq is it doesn't register it as bq but as bq looking at the params section of the query: str name=bq (*:* -replacement)^9950/str Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2747) Include formatted Changes.html for release
[ https://issues.apache.org/jira/browse/SOLR-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe resolved SOLR-2747. --- Resolution: Fixed We can add Lucene Changes.html generation in a separate issue. Include formatted Changes.html for release -- Key: SOLR-2747 URL: https://issues.apache.org/jira/browse/SOLR-2747 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Steven Rowe Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-2747_fix.patch, SOLR-2747.patch, SOLR-2747.patch, SOLR-2747.patch, SOLR-2747.patch, SOLR-2747.patch Just like when releasing Lucene, Solr should also have a html formatted changes file. The Lucene Perl script (lucene/src/site/changes/changes2html.pl) should be reused. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors
[ https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453356#comment-13453356 ] Erick Erickson commented on SOLR-3823: -- FWIW, I'm on a Mac (Lion) too, although I doubt that matters. Parentheses in a boost query cause errors - Key: SOLR-3823 URL: https://issues.apache.org/jira/browse/SOLR-3823 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-BETA Environment: Mac, jdk 1.6, Chrome Reporter: Mathos Marcer When using a boost query (bq) that contains a parentheses (like this example from the Relevancy Cookbook section of the wiki): {noformat} ? defType = dismax q = foo bar bq = (*:* -xxx)^999 {noformat} You get the following error: org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': Encountered ) ) at line 1, column 12. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... NUMBER ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4362) ban tab-indented source
[ https://issues.apache.org/jira/browse/LUCENE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-4362: --- Attachment: LUCENE-4362_4x.patch OK, after waiting 50 minutes for the tests to complete, all tests pass with these two patches (trunk and 4x). So if I check all this in, it'll change the generated java files since they were newly generated from the changes to the jflex/jj files. Is this the usual procedure? This doesn't address the tabs introduced by the parser compilers. If no one objects, I'll check this in probably tonight or tomorrow. But I'd still like to keep this open even so. Between last week and now more tabs have been introduced into source. Any suggestions about what to do about tabs introduced into generated files? ban tab-indented source --- Key: LUCENE-4362 URL: https://issues.apache.org/jira/browse/LUCENE-4362 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Assignee: Erick Erickson Attachments: LUCENE-4326_trunk.patch, LUCENE-4362_4x.patch, LUCENE-4362_4x.patch, LUCENE-4362_core.patch, LUCENE-4362.patch, LUCENE-4362.patch This makes code really difficult to read and work with. Its easy enough to prevent. {noformat} Index: build.xml === --- build.xml (revision 1380979) +++ build.xml (working copy) @@ -77,11 +77,12 @@ or containsregexp expression=@author\b casesensitive=yes/ containsregexp expression=\bno(n|)commit\b casesensitive=no/ + containsregexp expression=\t casesensitive=no/ /or /fileset map from=${validate.currDir}${file.separator} to=* / /pathconvert -fail if=validate.patternsFoundThe following files contain @author tags or nocommits:${line.separator}${validate.patternsFound}/fail +fail if=validate.patternsFoundThe following files contain @author tags, tabs or nocommits:${line.separator}${validate.patternsFound}/fail /target {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org