[jira] Resolved: (LUCENE-1529) back-compat tests (ant test-tag) should test JAR drop-in-ability
[ https://issues.apache.org/jira/browse/LUCENE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-1529. --- Resolution: Fixed Committed revision 756374. back-compat tests (ant test-tag) should test JAR drop-in-ability -- Key: LUCENE-1529 URL: https://issues.apache.org/jira/browse/LUCENE-1529 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 2.9 Reporter: Michael McCandless Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: lucene-1529.patch We now test back-compat with ant test-tag, which is very useful for catching breaks in back compat before committing. However, that currently checks out src/test sources and then compiles them against the trunk JAR, and runs the tests. Whereas our back compat policy: http://wiki.apache.org/lucene-java/BackwardsCompatibility states that no recompilation is required on upgrading to a new JAR. Ie you should be able to drop in the new JAR in place of your old one and things should work fine. So... we should fix ant test-tag to: * Do full checkout of core sources tests from the back-compat-tag * Compile the JAR from the back-compat sources * Compile the tests against that back-compat JAR * Swap in the trunk JAR * Run the tests -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683879#action_12683879 ] Grant Ingersoll commented on LUCENE-1567: - OK, I have started the IP Clearance in incubation. Please send in the software grant ASAP and make sure you CC me on it (gsing...@a.o) New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser, which means the syntax is identical and all query parser test cases pass on the new one too using a wrapper. Recent posts show that there is demand for query syntax improvements, e.g improved range query syntax or operator precedence. There are already different QP implementations in Lucene+contrib, however I think we did not keep them all up to date and in sync. This is not too surprising, because usually when fixes and changes are made to the main query parser, people don't make the corresponding changes in the contrib parsers. (I'm guilty here too) With this new architecture it will be much easier to maintain different query syntaxes, as the actual code for the first layer is not very much. All syntaxes would benefit from patches and improvements we make to the underlying layers, which will make supporting different syntaxes much more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] Created: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
[ https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary updated LUCENE-1568: Attachment: LUCENE-1568.patch Fixes an NPE when using DistanceQueryBuilder for just minimal bounding box searches e.g. {code} final DistanceQueryBuilder dq = new DistanceQueryBuilder( latitude, longitude, radius, latField, //name of latitude field in index lngField, //name of longitude field in index tierPrefix, // prefix of tier fields in index false /*filter by radius, false means mbb search */ ); {code} Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor Attachments: LUCENE-1568.patch NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
[ https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1568: --- Fix Version/s: 2.9 Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor Fix For: 2.9 Attachments: LUCENE-1568.patch NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-652. --- Resolution: Fixed Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
[ https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683951#action_12683951 ] patrick o'leary commented on LUCENE-1568: - If nobody objects I'll commit this later today Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor Fix For: 2.9 Attachments: LUCENE-1568.patch NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1561. Resolution: Fixed Maybe rename Field.omitTf, and strengthen the javadocs -- Key: LUCENE-1561 URL: https://issues.apache.org/jira/browse/LUCENE-1561 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 2.9 Attachments: LUCENE-1561.patch Spinoff from here: http://www.nabble.com/search-problem-when-indexed-using-Field.setOmitTf()-td22456141.html Maybe rename omitTf to something like omitTermPositions, and make it clear what queries will silently fail to work as a result. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1327) TermSpans skipTo() doesn't always move forwards
[ https://issues.apache.org/jira/browse/LUCENE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-1327. --- Resolution: Fixed Committed revision 756669. TermSpans skipTo() doesn't always move forwards --- Key: LUCENE-1327 URL: https://issues.apache.org/jira/browse/LUCENE-1327 Project: Lucene - Java Issue Type: Bug Components: Query/Scoring, Search Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4 Reporter: Moti Nisenson Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: lucene-1327.patch In TermSpans (or the anonymous Spans class returned by SpansTermQuery, depending on the version), the skipTo() method is improperly implemented if the target doc is less than or equal to the current doc: public boolean skipTo(int target) throws IOException { // are we already at the correct position? if (doc = target) { return true; } ... This violates the correct behavior (as described in the Spans interface documentation), that skipTo() should always move forwards, in other words the correct implementation would be: if (doc = target) { return next(); } This bug causes particular problems if one wants to use the payloads feature - this is because if one loads a payload, then performs a skipTo() to the same document, then tries to load the next payload, the spans hasn't changed position and it attempts to load the same payload again (which is an error). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683962#action_12683962 ] Uwe Schindler commented on LUCENE-652: -- Is an index compressed with Store.COMPRESS still readable? Can i uncompress fields compressed using the old tools also by retrieving the byte array and using CompressionTools? There should be some documentation about that. Another question: Compressing was also used for string fields, maybe CompressionTols also suplies a method to compress strings (and convert them to UTF-8 during that to be backwards compatible). This would prevent people from calling String.getBytes() without charset and then wondering, why they cannoit read their index again... Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683967#action_12683967 ] Michael McCandless commented on LUCENE-652: --- Good questions! bq. Is an index compressed with Store.COMPRESS still readable? Yes, we have to support that until Lucene 4.0. But Field.Store.COMPRESS will be removed in 3.0 (ie you can read previous compressed fields, interact w/ an index that has compressed fields in it, etc., just not add docs with Field.Store.COMPRESS to an index as of 3.0). bq. Can i uncompress fields compressed using the old tools also by retrieving the byte array and using CompressionTools? Well... yes, but: you can't actually get the compressed byte[] (because Lucene will decompress it for you). bq. Compressing was also used for string fields, maybe CompressionTols also suplies a method to compress strings (and convert them to UTF-8 during that to be backwards compatible). This would prevent people from calling String.getBytes() without charset and then wondering, why they cannoit read their index again... OK I'll add them. I'll name them compressString and decompressString. Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683969#action_12683969 ] Uwe Schindler commented on LUCENE-652: -- bq. OK I'll add them. I'll name them compressString and decompressString. Maybe it is better to use the new UTF-8 tools to encode/decode (instead of toBytes()). This would be consistent with the rest bof Lucene. But for the old deprecated Field.Store.COMPRESS, keep it how it is (backwards compatibility). Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683977#action_12683977 ] Uwe Schindler commented on LUCENE-652: -- Yes, should I prepare a patch for trunk and add these methods? Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Reopened: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-652: --- Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch, LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683985#action_12683985 ] Michael McCandless commented on LUCENE-652: --- If we switch to UnicodeUntil we may want to allow instantiation of CompressionTools, since UnicodeUtil is optimized for re-use. And if we do that we have to think about thread safety concurrency, probably using CloseableThreadLocal under the hood, and then add a close() method. Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch, LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683981#action_12683981 ] Michael McCandless commented on LUCENE-652: --- bq. Yes, should I prepare a patch for trunk and add these methods? You mean to switch to UnicodeUtil? That would be great! Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch, LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-652: -- Attachment: LUCENE-652.patch Added compress/decompressString, and improved javadocs to say this compression format matches Field.Store.COMPRESS. Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch, LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-652: - Attachment: LUCENE-652.patch This is a first version using UnicodeUtils. The deprecated Store.COMPRESS part still uses String.getBytes() because of backwards compatibility (otherwise it would be a change in index format). This version currenty creates a new UTFxResult, because no state, so not close method. It can also be synchronized or without ThreadLocal, but this may not be so good. The current version has a little performance impact because of array copying. Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch, LUCENE-652.patch, LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683993#action_12683993 ] Uwe Schindler edited comment on LUCENE-652 at 3/20/09 12:09 PM: This is a first version using UnicodeUtils. The deprecated Store.COMPRESS part still uses String.getBytes() because of backwards compatibility (otherwise it would be a change in index format). This version currenty creates a new UTFxResult, because no state and no close method. It can also be synchronized without ThreadLocal, but this may not be so good. The current version has a little performance impact because of array copying. was (Author: thetaphi): This is a first version using UnicodeUtils. The deprecated Store.COMPRESS part still uses String.getBytes() because of backwards compatibility (otherwise it would be a change in index format). This version currenty creates a new UTFxResult, because no state, so not close method. It can also be synchronized or without ThreadLocal, but this may not be so good. The current version has a little performance impact because of array copying. Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch, LUCENE-652.patch, LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12684067#action_12684067 ] Michael McCandless commented on LUCENE-652: --- OK thanks Uwe, it looks good. We can leave the other changes I suggested to future optimizations. I'll commit soon! Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch, LUCENE-652.patch, LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: New flexible query parser
: My vote for contrib would depend on the state of the code - if it passes all : the tests and is truly back compat, and is not crazy slower, I don't see why : we don't move it in right away depending on confidence levels. That would : ensure use and attention that contrib often misses. The old parser could hang : around in deprecation. FWIW: It's always bugged me that the existing queryParser is in the core anyway ... as i've mentioned before: I'd love to see us move towards putting more features and add-on functionality in contribs and keeping the core as lean as possible: just the core functionality for indexing searching ... when things are split up, it's easy for people who want every lucene feature to include a bunch of jars; it's harder for people who want to run lucene in a small footprint (embedded apps?) to extract classes from a big jar. so my vote would be to make it a contrib ... even if we do deprecate the current query parser because this can be 100% back compatible -- it just makes it a great opportunity to get query parsing out of hte core. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)
[ https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12685385#action_12685385 ] Uwe Schindler commented on LUCENE-652: -- Fine! In my opinion the little overhead of UnicodeUtils is far lower that the one by compression and the ByteArrayStreams. {quote} bq. Can i uncompress fields compressed using the old tools also by retrieving the byte array and using CompressionTools? Well... yes, but: you can't actually get the compressed byte[] (because Lucene will decompress it for you). {quote} You can: With a FieldSelector that load the fields for merge, you get the raw binary values (found out from the code of FieldsReader). Compressed fields should be externalized (from Fields into Document) -- Key: LUCENE-652 URL: https://issues.apache.org/jira/browse/LUCENE-652 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 1.9, 2.0.0, 2.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-652.patch, LUCENE-652.patch, LUCENE-652.patch Right now, as of 2.0 release, Lucene supports compressed stored fields. However, after discussion on java-dev, the suggestion arose, from Robert Engels, that it would be better if this logic were moved into the Document level. This way the indexing level just stores opaque binary fields, and then Document handles compress/uncompressing as needed. This approach would have prevented issues like LUCENE-629 because merging of segments would never need to decompress. See this thread for the recent discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/38836 When we do this we should also work on related issue LUCENE-648. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: New flexible query parser
On 3/20/09 10:58 PM, Chris Hostetter wrote: : My vote for contrib would depend on the state of the code - if it passes all : the tests and is truly back compat, and is not crazy slower, I don't see why : we don't move it in right away depending on confidence levels. That would : ensure use and attention that contrib often misses. The old parser could hang : around in deprecation. FWIW: It's always bugged me that the existing queryParser is in the core anyway ... as i've mentioned before: I'd love to see us move towards putting more features and add-on functionality in contribs and keeping the core as lean as possible: just the core functionality for indexing searching ... when things are split up, it's easy for people who want every lucene feature to include a bunch of jars; it's harder for people who want to run lucene in a small footprint (embedded apps?) to extract classes from a big jar. +1. I'd love to see Lucene going into such a direction. However, I'm a little worried about contrib's reputation. I think it contains components with differing levels of activity, maturity and support. So maybe instead of moving things from core into contrib to achieve the goal you mentioned, we could create a new folder named e.g. 'components', which will contain stuff that we claim is as stable, mature and supported as the core, just packaged into separate jars. Those jars should then only have dependencies on the core, but not on each other. They would also follow the same backwards-compatibility and other requirements as the core. Thoughts? -Michael so my vote would be to make it a contrib ... even if we do deprecate the current query parser because this can be 100% back compatible -- it just makes it a great opportunity to get query parsing out of hte core. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Closed: (LUCENE-1568) Fix for NPE's in Spatial Lucene for searching bounding box only
[ https://issues.apache.org/jira/browse/LUCENE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] patrick o'leary closed LUCENE-1568. --- resolved Fix for NPE's in Spatial Lucene for searching bounding box only --- Key: LUCENE-1568 URL: https://issues.apache.org/jira/browse/LUCENE-1568 Project: Lucene - Java Issue Type: Bug Components: contrib/spatial Reporter: patrick o'leary Assignee: patrick o'leary Priority: Minor Fix For: 2.9 Attachments: LUCENE-1568.patch NPE occurs when using DistanceQueryBuilder for minimal bounding box search without the distance filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Is TopDocCollector's collect() implementation correct?
(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) -- Forwarded message -- Date: Fri, 20 Mar 2009 15:29:13 -0700 (PDT) : TopDocCollector's (TDC) implementation of collect() seems a bit problematic : to me. This code isn't an area i'm very familiar with, but your assessment seems correct ... it looks like when LUCENE-1356 introduced the ability to provide a PriorityQueue to the constructor, the existing optimization when the score was obvoiusly too low was overlooked. It looks like this same bug got propogated to TopScoreDocCollector when it was introduced as well. : Introduce in TDC a private boolean which signals whether the default PQ is : used or not. If it's not used, don't do the 'else if' at all. If it is used, : then the 'else if' is safe. Then code could look like: my vote would just be to change the = comarison to a hq.lessThan call ... but i can understand how your proposal might be more efficient -- I'll let the performance experts fight it out ... but i definitely think you should fil a bug. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Using Highlighter for highlighting Phrase query
(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) : Date: Fri, 20 Mar 2009 15:30:27 -0700 (PDT) : : http://people.apache.org/~hossman/#java-dev : Please Use java-u...@lucene Not java-...@lucene : : Your question is better suited for the java-u...@lucene mailing list ... : not the java-...@lucene list. java-dev is for discussing development of : the internals of the Lucene Java library ... it is *not* the appropriate : place to ask questions about how to use the Lucene Java library when : developing your own applications. Please resend your message to : the java-user mailing list, where you are likely to get more/better : responses since that list also has a larger number of subscribers. : : : : : Date: Tue, 17 Mar 2009 07:38:08 -0700 (PDT) : : From: mitu2009 musicfrea...@gmail.com : : Reply-To: java-dev@lucene.apache.org : : To: java-dev@lucene.apache.org : : Subject: Using Highlighter for highlighting Phrase query : : : : : : Am using this version of Lucene highlighter.net API. I want to get a phrase : : highlighted only when ALL of its words are present in the search : : results..But,am not able to do sofor example, if my input search string : : is Leading telecom company, then the API only highlights telecom in the : : results if the result does not contain the words leading and company... : : : : Here is the code i'm using: : : : : SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter(); : : : : var appData = : : (string)AppDomain.CurrentDomain.GetData(DataDirectory); : : var folderpath = System.IO.Path.Combine(appData, MyFolder); : : : : indexReader = IndexReader.Open(folderpath); : : : : Highlighter highlighter = new Highlighter(htmlFormatter, new : : QueryScorer(finalQuery.Rewrite(indexReader))); : : : : : : highlighter.SetTextFragmenter(new SimpleFragmenter(800)); : : : : int maxNumFragmentsRequired = 5; : : : : string highlightedText = string.Empty; : : : : TokenStream tokenStream = this._analyzer.TokenStream(fieldName, : : new System.IO.StringReader(fieldText)); : : : : highlightedText = highlighter.GetBestFragments(tokenStream, : : fieldText, maxNumFragmentsRequired, ...); : : : : return highlightedText; : : : : -- : : View this message in context: http://www.nabble.com/Using-Highlighter-for-highlighting-Phrase-query-tp22560334p22560334.html : : Sent from the Lucene - Java Developer mailing list archive at Nabble.com. : : : : : : - : : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : : For additional commands, e-mail: java-dev-h...@lucene.apache.org : : : : : : -Hoss : : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Using MultiFieldQueryParser
(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) : Date: Fri, 20 Mar 2009 15:30:59 -0700 (PDT) : : http://people.apache.org/~hossman/#java-dev : Please Use java-u...@lucene Not java-...@lucene : : Your question is better suited for the java-u...@lucene mailing list ... : not the java-...@lucene list. java-dev is for discussing development of : the internals of the Lucene Java library ... it is *not* the appropriate : place to ask questions about how to use the Lucene Java library when : developing your own applications. Please resend your message to : the java-user mailing list, where you are likely to get more/better : responses since that list also has a larger number of subscribers. : : : : Date: Tue, 17 Mar 2009 08:47:05 -0700 (PDT) : : From: mitu2009 musicfrea...@gmail.com : : Reply-To: java-dev@lucene.apache.org : : To: java-dev@lucene.apache.org : : Subject: Using MultiFieldQueryParser : : : : : : Hi, : : : : Am working on a book search api using Lucene.User can search for a book : : whose title or description field contains C.F.A.. : : Am using Lucene's MultiFieldQueryParser..But after parsing, its removing the : : dots in the string. : : : : What am i missing here? : : : : Thanks. : : : : -- : : View this message in context: http://www.nabble.com/Using-MultiFieldQueryParser-tp22562134p22562134.html : : Sent from the Lucene - Java Developer mailing list archive at Nabble.com. : : : : : : - : : To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org : : For additional commands, e-mail: java-dev-h...@lucene.apache.org : : : : : : -Hoss : : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: move TrieRange* to core?
(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) : Date: Fri, 20 Mar 2009 16:51:05 -0700 (PDT) : : : I think we should move TrieRange* into core before 2.9? : : -0 : : I think we should try to move more things *out* of the core in 3.0 (as : i've mentioned in other threads) ... but i certianly understand the : arguments for going the other direction. : : : It's received alot of attention, from both developers (Uwe Yonik did : : lots of iterations, and Solr is folding it in) and user interest. : : it's a chicken/egg problem that we move things into the core because they : are very useful and we want to give them more visibilty, but if we had : less things in the core and more things in contribs (query parser, spans, : standard analyzer, non-primative Query impls, etc...) then contribs as a : whole would be more visible. ... I'm getting a sense of deja-vu, ah : yes, here it is ... : : http://www.nabble.com/Moving-SweetSpotSimilarity-out-of-contrib-to19267437.html#a19320894 : : : -Hoss : : -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Is TopDocCollector's collect() implementation correct?
Thanks Chris/Hoss (not sure who sent the original reply). I don't like calling pq.lessThan, as pq.insert and pq.insertWithOverflow call it anyway internally and since it would add a method call (something that was tried to be avoided in the current implementation), I prefer the code I proposed below. BTW, I introduced 1356, so I take full responsibility on this overlooking. The main reason for 1356 was to allow creating extensions of TopDocCollector so they can be of the same type, and share the topDocs() and totalHIts() implementations. I can file an issue. Any other comments? Shai On Sat, Mar 21, 2009 at 3:48 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: (resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did) -- Forwarded message -- Date: Fri, 20 Mar 2009 15:29:13 -0700 (PDT) : TopDocCollector's (TDC) implementation of collect() seems a bit problematic : to me. This code isn't an area i'm very familiar with, but your assessment seems correct ... it looks like when LUCENE-1356 introduced the ability to provide a PriorityQueue to the constructor, the existing optimization when the score was obvoiusly too low was overlooked. It looks like this same bug got propogated to TopScoreDocCollector when it was introduced as well. : Introduce in TDC a private boolean which signals whether the default PQ is : used or not. If it's not used, don't do the 'else if' at all. If it is used, : then the 'else if' is safe. Then code could look like: my vote would just be to change the = comarison to a hq.lessThan call ... but i can understand how your proposal might be more efficient -- I'll let the performance experts fight it out ... but i definitely think you should fil a bug. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org