[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5189: --- Attachment: LUCENE-5189.patch Added some javadocs, converted all nocommits to TODOs. I think it's ready for trunk. I'd like to handle FIS.gen next. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5180) ShingleFilter should make shingles from trailing holes
[ https://issues.apache.org/jira/browse/LUCENE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767718#comment-13767718 ] Steve Rowe commented on LUCENE-5180: +1, patch looks good. +1 to your suggestion about ShingleFilterTest.TestTokenStream: bq. // TODO: merge w/ CannedTokenStream? ShingleFilter should make shingles from trailing holes -- Key: LUCENE-5180 URL: https://issues.apache.org/jira/browse/LUCENE-5180 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.6 Attachments: LUCENE-5180.patch When ShingleFilter hits a hole, it uses _ as the token, e.g. bigrams for the dog barked, if you have a StopFilter removing the, would be: _ dog, dog barked. But if the input ends with a stopword, e.g. wizard of, ShingleFilter fails to produce wizard _ due to LUCENE-3849 ... once we fix that I think we should fix ShingleFilter to make shingles for trailing holes too ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4988) Upgrade svnkit to version compatible with svn 1.8
[ https://issues.apache.org/jira/browse/SOLR-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767721#comment-13767721 ] Shai Erera commented on SOLR-4988: -- I see they now have a beta version available for download: http://www.svnkit.com/download.php. Since we only use it for precommit, is there anything that should prevent us from using the beta version? I am currently unable to run precommit because it fails with this error: {noformat} BUILD FAILED D:\dev\lucene\lucene-5189\build.xml:335: The following error occurred while executing this line: D:\dev\lucene\lucene-5189\extra-targets.xml:66: The following error occurred while executing this line: D:\dev\lucene\lucene-5189\extra-targets.xml:82: org.tmatesoft.svn.core.SVNException: svn: E155021: This client is too old to work with the working copy at 'D:\dev\lucene\lucene-5189' (format '31'). at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:64) at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:51) at org.tmatesoft.svn.core.internal.wc17.db.SVNWCDbRoot.init(SVNWCDbRoot.java:95) {noformat} Upgrade svnkit to version compatible with svn 1.8 - Key: SOLR-4988 URL: https://issues.apache.org/jira/browse/SOLR-4988 Project: Solr Issue Type: Task Reporter: Alan Woodward Assignee: Alan Woodward If you've got subversion 1.8 installed, ant precommit fails due to svn version incompatibilities. It looks as though there isn't an svnkit release yet that supports 1.8. Once one is available, we should upgrade our dependencies. See http://subversion.1072662.n5.nabble.com/ETA-on-1-8-support-td181632.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767732#comment-13767732 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1523419 from [~thetaphi] in branch 'dev/branches/lucene5207' [ https://svn.apache.org/r1523419 ] LUCENE-5207: Add a test which verifies that the classloader restrictions work correctly lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767733#comment-13767733 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1523421 from [~thetaphi] in branch 'dev/branches/lucene5207' [ https://svn.apache.org/r1523421 ] LUCENE-5207: Actually test that it works with mixed classloaders lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767765#comment-13767765 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1523426 from [~thetaphi] in branch 'dev/branches/lucene5207' [ https://svn.apache.org/r1523426 ] LUCENE-5207: Better classloade test: It now uses a completely synthetic class, so its 100% unreachable from main classloader. Also remove the static Opcodes imports lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5180) ShingleFilter should make shingles from trailing holes
[ https://issues.apache.org/jira/browse/LUCENE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5180: --- Attachment: LUCENE-5180.patch Thanks Steve! Here's a new patch w/ that TODO done ... I think it's ready. ShingleFilter should make shingles from trailing holes -- Key: LUCENE-5180 URL: https://issues.apache.org/jira/browse/LUCENE-5180 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.6 Attachments: LUCENE-5180.patch, LUCENE-5180.patch When ShingleFilter hits a hole, it uses _ as the token, e.g. bigrams for the dog barked, if you have a StopFilter removing the, would be: _ dog, dog barked. But if the input ends with a stopword, e.g. wizard of, ShingleFilter fails to produce wizard _ due to LUCENE-3849 ... once we fix that I think we should fix ShingleFilter to make shingles for trailing holes too ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control
[ https://issues.apache.org/jira/browse/LUCENE-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767770#comment-13767770 ] Michael McCandless commented on LUCENE-3425: OK, thanks for the explanation; now I understand AverageMergePolicy's purpose, and it makes sense. It's ironic that a fully optimized index is the worst thing you could do when searching segments concurrently ... But, I still don't understand why AverageMergePolicy is not merging the little segments from NRTCachingDir. Do you tell it to target a maximum number of segments in the index? If so, once the index is large enough, it seems like that'd force the small segments to be merged. Maybe, you could also tell it a minimum size of the segments, so that it would merge away any segments still held in NRTCachingDir? NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control Key: LUCENE-3425 URL: https://issues.apache.org/jira/browse/LUCENE-3425 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0-ALPHA Reporter: Shay Banon Fix For: 5.0, 4.5 A discussion on IRC raised several improvements that can be made to NRT caching dir. Some of the problems it currently has are: 1. Not explicitly controlling the memory usage, which can result in overusing memory (for example, large new segments being committed because refreshing is too far behind). 2. Heap fragmentation because of constant allocation of (probably promoted to old gen) byte buffers. 3. Not being able to control the memory usage across indices for multi index usage within a single JVM. A suggested solution (which still needs to be ironed out) is to have a BufferAllocator that controls allocation of byte[], and allow to return unused byte[] to it. It will have a cap on the size of memory it allows to be allocated. The NRT caching dir will use the allocator, which can either be provided (for usage across several indices) or created internally. The caching dir will also create a wrapped IndexOutput, that will flush to the main dir if the allocator can no longer provide byte[] (exhausted). When a file is flushed from the cache to the main directory, it will return all the currently allocated byte[] to the BufferAllocator to be reused by other files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5214) Add new FreeTextSuggester, to handle long tail suggestions
Michael McCandless created LUCENE-5214: -- Summary: Add new FreeTextSuggester, to handle long tail suggestions Key: LUCENE-5214 URL: https://issues.apache.org/jira/browse/LUCENE-5214 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.6 The current suggesters are all based on a finite space of possible suggestions, i.e. the ones they were built on, so they can only suggest a full suggestion from that space. This means if the current query goes outside of that space then no suggestions will be found. The goal of FreeTextSuggester is to address this, by giving predictions based on an ngram language model, i.e. using the last few tokens from the user's query to predict likely following token. I got the idea from this blog post about Google's suggest: http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html This is very much still a work in progress, but it seems to be working. I've tested it on the AOL query logs, using an interactive tool from luceneutil to show the suggestions, and it seems to work well. It's fun to use that tool to explore the word associations... I don't think this suggester would be used standalone; rather, I think it'd be a fallback for times when the primary suggester fails to find anything. You can see this behavior on google.com, if you type the fast and the , you see entire queries being suggested, but then if the next word you type is burning then suddenly you see the suggestions are only based on the last word, not the entire query. It uses ShingleFilter under-the-hood to generate the token ngrams; once LUCENE-5180 is in it will be able to properly handle a user query that ends with stop-words (e.g. wizard of ), and then stores the ngrams in an FST. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5214) Add new FreeTextSuggester, to handle long tail suggestions
[ https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5214: --- Attachment: LUCENE-5214.patch Current patch, very much work in progress... Add new FreeTextSuggester, to handle long tail suggestions Key: LUCENE-5214 URL: https://issues.apache.org/jira/browse/LUCENE-5214 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.6 Attachments: LUCENE-5214.patch The current suggesters are all based on a finite space of possible suggestions, i.e. the ones they were built on, so they can only suggest a full suggestion from that space. This means if the current query goes outside of that space then no suggestions will be found. The goal of FreeTextSuggester is to address this, by giving predictions based on an ngram language model, i.e. using the last few tokens from the user's query to predict likely following token. I got the idea from this blog post about Google's suggest: http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html This is very much still a work in progress, but it seems to be working. I've tested it on the AOL query logs, using an interactive tool from luceneutil to show the suggestions, and it seems to work well. It's fun to use that tool to explore the word associations... I don't think this suggester would be used standalone; rather, I think it'd be a fallback for times when the primary suggester fails to find anything. You can see this behavior on google.com, if you type the fast and the , you see entire queries being suggested, but then if the next word you type is burning then suddenly you see the suggestions are only based on the last word, not the entire query. It uses ShingleFilter under-the-hood to generate the token ngrams; once LUCENE-5180 is in it will be able to properly handle a user query that ends with stop-words (e.g. wizard of ), and then stores the ngrams in an FST. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767799#comment-13767799 ] Robert Muir commented on LUCENE-5189: - +1 to go to trunk. thanks Shai. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5215) Add support for FieldInfos generation
Shai Erera created LUCENE-5215: -- Summary: Add support for FieldInfos generation Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767811#comment-13767811 ] Robert Muir commented on LUCENE-5215: - SI attributes may not be used at all today. It worked well for handling the 3.x integration as it was a place for us to stuff things like hasSharedDocStores and indexwriter was still able to hackishly get at it, but deprecation might be a good option. we shoudl see what is using this in trunk. Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5237) Add Lucene Index heap usage to LukeRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767815#comment-13767815 ] ASF subversion and git services commented on SOLR-5237: --- Commit 1523442 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1523442 ] SOLR-5237: add lucene index heap usage to luke request handler Add Lucene Index heap usage to LukeRequestHandler -- Key: SOLR-5237 URL: https://issues.apache.org/jira/browse/SOLR-5237 Project: Solr Issue Type: Improvement Reporter: Areek Zillur Attachments: SOLR-5237.patch, SOLR-5237.patch It would be useful to see the current index heap usage (https://issues.apache.org/jira/browse/LUCENE-5197) by lucene in the LukeRequestHandler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5237) Add Lucene Index heap usage to LukeRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767816#comment-13767816 ] ASF subversion and git services commented on SOLR-5237: --- Commit 1523443 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1523443 ] SOLR-5237: add lucene index heap usage to luke request handler Add Lucene Index heap usage to LukeRequestHandler -- Key: SOLR-5237 URL: https://issues.apache.org/jira/browse/SOLR-5237 Project: Solr Issue Type: Improvement Reporter: Areek Zillur Attachments: SOLR-5237.patch, SOLR-5237.patch It would be useful to see the current index heap usage (https://issues.apache.org/jira/browse/LUCENE-5197) by lucene in the LukeRequestHandler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5237) Add Lucene Index heap usage to LukeRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-5237. --- Resolution: Fixed Fix Version/s: 4.6 5.0 Thanks Areek! Maybe you want to open an issue to add this to the Admin UI? Add Lucene Index heap usage to LukeRequestHandler -- Key: SOLR-5237 URL: https://issues.apache.org/jira/browse/SOLR-5237 Project: Solr Issue Type: Improvement Reporter: Areek Zillur Fix For: 5.0, 4.6 Attachments: SOLR-5237.patch, SOLR-5237.patch It would be useful to see the current index heap usage (https://issues.apache.org/jira/browse/LUCENE-5197) by lucene in the LukeRequestHandler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5214) Add new FreeTextSuggester, to handle long tail suggestions
[ https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767820#comment-13767820 ] Robert Muir commented on LUCENE-5214: - This looks awesome: I think LUCENE-5180 will resolve a lot of the TODOs? I'm glad these corner cases of trailing stopwords etc were fixed properly in the analysis chain. And I like the name... Add new FreeTextSuggester, to handle long tail suggestions Key: LUCENE-5214 URL: https://issues.apache.org/jira/browse/LUCENE-5214 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.6 Attachments: LUCENE-5214.patch The current suggesters are all based on a finite space of possible suggestions, i.e. the ones they were built on, so they can only suggest a full suggestion from that space. This means if the current query goes outside of that space then no suggestions will be found. The goal of FreeTextSuggester is to address this, by giving predictions based on an ngram language model, i.e. using the last few tokens from the user's query to predict likely following token. I got the idea from this blog post about Google's suggest: http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html This is very much still a work in progress, but it seems to be working. I've tested it on the AOL query logs, using an interactive tool from luceneutil to show the suggestions, and it seems to work well. It's fun to use that tool to explore the word associations... I don't think this suggester would be used standalone; rather, I think it'd be a fallback for times when the primary suggester fails to find anything. You can see this behavior on google.com, if you type the fast and the , you see entire queries being suggested, but then if the next word you type is burning then suddenly you see the suggestions are only based on the last word, not the entire query. It uses ShingleFilter under-the-hood to generate the token ngrams; once LUCENE-5180 is in it will be able to properly handle a user query that ends with stop-words (e.g. wizard of ), and then stores the ngrams in an FST. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767825#comment-13767825 ] Uwe Schindler commented on LUCENE-5207: --- +1 to merge commit this to trunk and 4.x. In the last 3 commits to the branch I just added new tests, so no functional changes anymore. Robert: If you create a new patch for reference, it would be good. Otherwise do merge --reintegrate, commit, and delete the branch! lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr
[ https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767827#comment-13767827 ] ASF subversion and git services commented on SOLR-5167: --- Commit 1523451 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1523451 ] SOLR-5167: Ability to use AnalyzingInfixSuggester in Solr Ability to use AnalyzingInfixSuggester in Solr -- Key: SOLR-5167 URL: https://issues.apache.org/jira/browse/SOLR-5167 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Varun Thacker Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-5167.patch, SOLR-5167.patch We should be able to use AnalyzingInfixSuggester in Solr by defining it in solrconfig.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767831#comment-13767831 ] Shai Erera commented on LUCENE-5215: Well, SI.attributes says it's the place for Codecs to put custom attributes in, and I remember Mike and I once discussed using them for putting some facet related stuff, but we didn't pursue it. Maybe if we record them in SIS, it's simple enough and we can keep them? If they are meant to be used by Codecs only, then maybe we can force Codecs to manage them themselves, but if e.g. some other code will want to rely on them, would it be possible? Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4988) Upgrade svnkit to version compatible with svn 1.8
[ https://issues.apache.org/jira/browse/SOLR-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-4988: Attachment: SOLR-4988.patch Thanks Shai. Here's a patch that adds the tmatesoft snapshots repository to ivy-settings.xml, and updates extra-targets.xml to use svnkit 1.8.0-SNAPSHOT. I'm a bit wary of committing it, though, as I don't like adding dependencies on snapshots. Maybe just keep this patch up for people who want to use svn 1.8 until tmatesoft officially release svnkit 1.8.0? (I ended up working round the precommit problem by just using svn 1.7 for this project. Or you could just be resigned to breaking the build occasionally :-) Upgrade svnkit to version compatible with svn 1.8 - Key: SOLR-4988 URL: https://issues.apache.org/jira/browse/SOLR-4988 Project: Solr Issue Type: Task Reporter: Alan Woodward Assignee: Alan Woodward Attachments: SOLR-4988.patch If you've got subversion 1.8 installed, ant precommit fails due to svn version incompatibilities. It looks as though there isn't an svnkit release yet that supports 1.8. Once one is available, we should upgrade our dependencies. See http://subversion.1072662.n5.nabble.com/ETA-on-1-8-support-td181632.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767832#comment-13767832 ] Robert Muir commented on LUCENE-5215: - Yes but the issue is that they are write-once. If a codec component that needs attributes (e.g. a dv one) was to write this stuff in its own file, it would work with updates because of segment suffix. Additionally we already have per-commit MapString,String: the one used by setCommitData(MapString,String commitUserData)... Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767836#comment-13767836 ] Robert Muir commented on LUCENE-5215: - Anyway I agree we should spin off a separate issue for this... this issue for fieldinfos will be fun enough by itself :) Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr
[ https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767837#comment-13767837 ] ASF subversion and git services commented on SOLR-5167: --- Commit 1523454 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1523454 ] SOLR-5167: Ability to use AnalyzingInfixSuggester in Solr Ability to use AnalyzingInfixSuggester in Solr -- Key: SOLR-5167 URL: https://issues.apache.org/jira/browse/SOLR-5167 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Varun Thacker Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-5167.patch, SOLR-5167.patch We should be able to use AnalyzingInfixSuggester in Solr by defining it in solrconfig.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr
[ https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-5167. --- Resolution: Fixed Fix Version/s: (was: 4.5) 4.6 Thanks Varun and Areek! Ability to use AnalyzingInfixSuggester in Solr -- Key: SOLR-5167 URL: https://issues.apache.org/jira/browse/SOLR-5167 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Varun Thacker Priority: Minor Fix For: 5.0, 4.6 Attachments: SOLR-5167.patch, SOLR-5167.patch We should be able to use AnalyzingInfixSuggester in Solr by defining it in solrconfig.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5207: Attachment: LUCENE-5207.patch Yes, I'll merge (because I am not totally sure about the python script really showing all the differences in various config stuff). Here is the updated patch from the script though. lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767851#comment-13767851 ] Uwe Schindler commented on LUCENE-5207: --- Thanks! lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4988) Upgrade svnkit to version compatible with svn 1.8
[ https://issues.apache.org/jira/browse/SOLR-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767848#comment-13767848 ] Uwe Schindler commented on SOLR-4988: - Thanks Alan! Adding snapshot repos is indeed not the best idea (reproducibility of builds), so we should for now stick with this version. Currently I am not sure if svnkit 1.8 will be available from Maven Central in time. The latest Maven Central version of 1.7 is still behind the one on their own repo. But if its urgent, once 1.8 is released, we can of course add the official svnkit repo to our list of repositories (the release one). Upgrade svnkit to version compatible with svn 1.8 - Key: SOLR-4988 URL: https://issues.apache.org/jira/browse/SOLR-4988 Project: Solr Issue Type: Task Reporter: Alan Woodward Assignee: Alan Woodward Attachments: SOLR-4988.patch If you've got subversion 1.8 installed, ant precommit fails due to svn version incompatibilities. It looks as though there isn't an svnkit release yet that supports 1.8. Once one is available, we should upgrade our dependencies. See http://subversion.1072662.n5.nabble.com/ETA-on-1-8-support-td181632.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767852#comment-13767852 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1523461 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1523461 ] LUCENE-5189: add NumericDocValues updates Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767853#comment-13767853 ] Shai Erera commented on LUCENE-5189: Thanks, committed to trunk, revision 1523461. After we resolve all corner issues, and let Jenkins sleep on it for a while, I'll port to 4x. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4988) Upgrade svnkit to version compatible with svn 1.8
[ https://issues.apache.org/jira/browse/SOLR-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767854#comment-13767854 ] Shai Erera commented on SOLR-4988: -- Thanks Alan! I see that 'precommit' runs 'documentation-lint', 'validate' and 'check-svn-working-copy'. The latter is the one that fails, so for now I run the first two only. Upgrade svnkit to version compatible with svn 1.8 - Key: SOLR-4988 URL: https://issues.apache.org/jira/browse/SOLR-4988 Project: Solr Issue Type: Task Reporter: Alan Woodward Assignee: Alan Woodward Attachments: SOLR-4988.patch If you've got subversion 1.8 installed, ant precommit fails due to svn version incompatibilities. It looks as though there isn't an svnkit release yet that supports 1.8. Once one is available, we should upgrade our dependencies. See http://subversion.1072662.n5.nabble.com/ETA-on-1-8-support-td181632.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767855#comment-13767855 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1523462 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1523462 ] LUCENE-5207: lucene expressions module lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767858#comment-13767858 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1523464 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1523464 ] LUCENE-5207: Fix a bug in the test and add final to the classloader lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
Shai Erera created LUCENE-5216: -- Summary: Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767864#comment-13767864 ] Michael McCandless commented on LUCENE-5216: +1 to remove them. Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767867#comment-13767867 ] Shai Erera commented on LUCENE-5216: I searched for SI.attributes(), they aren't used anywhere. Can we just remove them from the API? If so, it's easy -- only need to create a new format version. What do you think? Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767869#comment-13767869 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1523470 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1523470 ] LUCENE-5207: lucene expressions module lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767871#comment-13767871 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1523471 from [~rcmuir] in branch 'dev/branches/lucene5207' [ https://svn.apache.org/r1523471 ] LUCENE-5207: remove branch lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5207. - Resolution: Fixed Fix Version/s: 4.6 5.0 Thanks Ryan! lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Fix For: 5.0, 4.6 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767874#comment-13767874 ] Robert Muir commented on LUCENE-5216: - AFAIK They are being used in branch_4x for 3.x back compat. So there i think we should simply deprecate, just so we dont have to reimplement hairy back compat :) Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #448: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/448/ 2 tests failed. REGRESSION: org.apache.solr.cloud.BasicDistributedZk2Test.testDistribSearch Error Message: null Stack Trace: java.lang.NullPointerException: null at __randomizedtesting.SeedInfo.seed([1FF121B874B7B1F7:9E17AFA003E8D1CB]:0) at org.apache.solr.common.cloud.ZkCoreNodeProps.getBaseUrl(ZkCoreNodeProps.java:40) at org.apache.solr.client.solrj.impl.CloudSolrServer.buildUrlMap(CloudSolrServer.java:406) at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:304) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:498) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.commit(AbstractFullDistribZkTestBase.java:1517) at org.apache.solr.cloud.BasicDistributedZk2Test.brindDownShardIndexSomeDocsAndRecover(BasicDistributedZk2Test.java:288) at org.apache.solr.cloud.BasicDistributedZk2Test.doTest(BasicDistributedZk2Test.java:115) FAILED: org.apache.solr.cloud.SyncSliceTest.testDistribSearch Error Message: expected:5 but was:4 Stack Trace: java.lang.AssertionError: expected:5 but was:4 at __randomizedtesting.SeedInfo.seed([FBE1BE5413A67E6B:7A07304C64F91E57]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:175) Build Log: [...truncated 25001 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b106) - Build # 7478 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7478/ Java: 32bit/jdk1.8.0-ea-b106 -client -XX:+UseG1GC All tests passed Build Log: [...truncated 31860 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:396: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:335: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:66: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:135: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/core/src/java/org/apache/lucene/index/NumericUpdate.java * ./lucene/core/src/java/org/apache/lucene/util/RefCount.java * ./lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java Total time: 42 minutes 17 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 32bit/jdk1.8.0-ea-b106 -client -XX:+UseG1GC Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b106) - Build # 7478 - Failure!
committed a fix. Shai On Sun, Sep 15, 2013 at 9:54 PM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7478/ Java: 32bit/jdk1.8.0-ea-b106 -client -XX:+UseG1GC All tests passed Build Log: [...truncated 31860 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:396: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:335: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:66: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:135: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/core/src/java/org/apache/lucene/index/NumericUpdate.java * ./lucene/core/src/java/org/apache/lucene/util/RefCount.java * ./lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java Total time: 42 minutes 17 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 32bit/jdk1.8.0-ea-b106 -client -XX:+UseG1GC Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5234) Allow SolrResourceLoader to load resources from URLs
[ https://issues.apache.org/jira/browse/SOLR-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-5234: Attachment: SOLR-5234.patch New patch, with test for local URLs (thanks for the suggestion, Mark!) Allow SolrResourceLoader to load resources from URLs Key: SOLR-5234 URL: https://issues.apache.org/jira/browse/SOLR-5234 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-5234.patch, SOLR-5234.patch This would allow multiple solr instance to share large configuration files. It would also help resolve problems caused by attempting to store 1Mb files in zookeeper. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 828 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/828/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 35870 lines...] BUILD FAILED /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:396: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:335: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:66: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:135: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/core/src/java/org/apache/lucene/index/NumericUpdate.java * ./lucene/core/src/java/org/apache/lucene/util/RefCount.java * ./lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java Total time: 130 minutes 41 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767887#comment-13767887 ] Shai Erera commented on LUCENE-5215: I started by creating a new Lucene46Codec and matching Lucene46FieldInfosFormat (+reader/writer). There is an API issue with FISFormat - it doesn't take segmentSuffix in neither FISReader.read() nor FISWriter.write(). We need to make an API break and I'm thinking if we want to do it big time and pass SegRead/WriteState already, instead of adding just one parameter? To be consistent with the other formats (well, as much as possible - the other formats take SRS/SWS and pass to their reader/writer). Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Parquet dictionary encoding bit packing
Hi, I was reading the Parquet announcement from July: https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop And a few things caught my attention - Dictionary encoding and (dynamic) bit packing. This smells like something Adrien likes to eat for breakfast. Over in the Hadoop ecosystem Parquet interest has picked up: http://search-hadoop.com/?q=parquet I thought I'd point it out as I haven't seen anyone bring this up. I imagine there are ideas to be borrowed there. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767921#comment-13767921 ] Robert Muir commented on LUCENE-5215: - No, they dont take this and for damn good reason: SRS/SWS contain fieldinfos. Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-2548: --- Attachment: SOLR-2548_multithreaded_faceting,_dsmiley.patch The attached patch improves on my previous one a little -- a few more comments, a variable rename for clarity, an assertion. And of course I removed the future.cancel() loop. I think this code is pretty clear as far as multithreaded code goes: One loop that submits tasks, and a follow-on loop that consumes the results of those tasks, and a semaphore to ensure no more than the desired number of threads are computing the facets. It'd be cool to eventually extend multithreading across all the faceting types. I'll look into that next week. Multithreaded faceting -- Key: SOLR-2548 URL: https://issues.apache.org/jira/browse/SOLR-2548 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.1 Reporter: Janne Majaranta Assignee: Erick Erickson Priority: Minor Labels: facet Fix For: 4.5, 5.0 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Parquet dictionary encoding bit packing
Okay, but what exactly does Parquet have to offer to a search engine? I mean, is it simply an alternate form of codec? Would it merely reduce I/O and mass storage requirements? Would it impact search performance at all? Would it add a significant search start-up warming overhead? Or, does it offer some magic that would in fact dramatically reduce the time to do the first query? Or, is it merely an alternative format for ingestion of an input stream? Like, say, better than JavaBin? Or, maybe for more efficient internode transfers of documents for SolrCloud? -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Sunday, September 15, 2013 5:17 PM To: dev@lucene.apache.org Subject: Parquet dictionary encoding bit packing Hi, I was reading the Parquet announcement from July: https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop And a few things caught my attention - Dictionary encoding and (dynamic) bit packing. This smells like something Adrien likes to eat for breakfast. Over in the Hadoop ecosystem Parquet interest has picked up: http://search-hadoop.com/?q=parquet I thought I'd point it out as I haven't seen anyone bring this up. I imagine there are ideas to be borrowed there. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5242) Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) is not from oracle
james michael dupont created SOLR-5242: -- Summary: Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) is not from oracle Key: SOLR-5242 URL: https://issues.apache.org/jira/browse/SOLR-5242 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 5.0 Reporter: james michael dupont the webpage says that it is oracle openjdk but that is not from oracle Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767941#comment-13767941 ] Erick Erickson commented on SOLR-2548: -- So maybe just commit this when you think it's ready? I'll probably get a chance to look it over Tuesday on the airplane, but if you're happy with it feel free. We can always put the other faceting types into a new JIRA? Multithreaded faceting -- Key: SOLR-2548 URL: https://issues.apache.org/jira/browse/SOLR-2548 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.1 Reporter: Janne Majaranta Assignee: Erick Erickson Priority: Minor Labels: facet Fix For: 4.5, 5.0 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #971: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/971/ No tests ran. Build Log: [...truncated 11794 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #971: POMs out of sync
I committed a fix... I will open an issue to replace these exclusions with wildcards (so it works like ivy). On Sun, Sep 15, 2013 at 6:53 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/971/ No tests ran. Build Log: [...truncated 11794 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5217) disable transitive dependencies in maven config
Robert Muir created LUCENE-5217: --- Summary: disable transitive dependencies in maven config Key: LUCENE-5217 URL: https://issues.apache.org/jira/browse/LUCENE-5217 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Our ivy configuration does this: each dependency is specified and so we know what will happen. Unfortunately the maven setup is not configured the same way. Instead the maven setup is configured to download the internet: and it excludes certain things specifically. This is really hard to configure and maintain: we added a 'validate-maven-dependencies' that tries to fail on any extra jars, but all it really does is run a license check after maven runs. It wouldnt find unnecessary dependencies being dragged in if something else in lucene was using them and thus they had a license file. Since maven supports wildcard exclusions: MNG-3832, we can disable this transitive shit completely. We should do this, so its configuration is the exact parallel of ivy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5217) disable transitive dependencies in maven config
[ https://issues.apache.org/jira/browse/LUCENE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767950#comment-13767950 ] Robert Muir commented on LUCENE-5217: - This is also described here: http://www.smartjava.org/content/maven-and-wildcard-exclusions I think it just means we have to require a minimum of maven 3 instead of also supporting 2. Since this has been out for 3 years (in fact older than the ant 1.8.2 that we require), I don't see this as a significant imposition on anyone? disable transitive dependencies in maven config --- Key: LUCENE-5217 URL: https://issues.apache.org/jira/browse/LUCENE-5217 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Our ivy configuration does this: each dependency is specified and so we know what will happen. Unfortunately the maven setup is not configured the same way. Instead the maven setup is configured to download the internet: and it excludes certain things specifically. This is really hard to configure and maintain: we added a 'validate-maven-dependencies' that tries to fail on any extra jars, but all it really does is run a license check after maven runs. It wouldnt find unnecessary dependencies being dragged in if something else in lucene was using them and thus they had a license file. Since maven supports wildcard exclusions: MNG-3832, we can disable this transitive shit completely. We should do this, so its configuration is the exact parallel of ivy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767999#comment-13767999 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1523525 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1523525 ] LUCENE-5189: add testcase Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5242) Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) is not from oracle
[ https://issues.apache.org/jira/browse/SOLR-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-5242. --- Resolution: Not A Problem This is not a problem, this is what openjdk reports. you can easily see this yourself with any openjdk: java -XshowSettings:properties -version | grep -i oracle java.specification.vendor = Oracle Corporation java.vendor = Oracle Corporation java.vendor.url = http://java.oracle.com/ java.vm.specification.vendor = Oracle Corporation java.vm.vendor = Oracle Corporation Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) is not from oracle -- Key: SOLR-5242 URL: https://issues.apache.org/jira/browse/SOLR-5242 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 5.0 Reporter: james michael dupont the webpage says that it is oracle openjdk but that is not from oracle Runtime Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_25 23.7-b01) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.
[ https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768019#comment-13768019 ] Robert Muir commented on LUCENE-5210: - I have no opinion on the eclipse/ant/antunit stuff. I just want to say currently there is no test, so I think we should start with a test and then improve it. My one suggestion about testing and the jars: if the test is in java, it can easily create jars on the fly in temp dirs so we dont have to package them (with fake licenses). This is done in ResourceLoaderTest in solr for example: {code} public void testClassLoaderLibs() throws Exception { File tmpRoot = _TestUtil.getTempDir(testClassLoaderLibs); File lib = new File(tmpRoot, lib); lib.mkdirs(); JarOutputStream jar1 = new JarOutputStream(new FileOutputStream(new File(lib, jar1.jar))); jar1.putNextEntry(new JarEntry(aLibFile)); jar1.closeEntry(); jar1.close(); ... {code} Unit tests for LicenseCheckTask. Key: LUCENE-5210 URL: https://issues.apache.org/jira/browse/LUCENE-5210 Project: Lucene - Core Issue Type: Test Components: general/build Reporter: Mark Miller Attachments: LUCENE-5210.patch, LUCENE-5210.patch While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a second class citizen - excluded from UI src folder setup and with no units tests. This was a little scary to me. I've started adding some units tests. So far I have mainly just done the lifting of getting units tests to work as part of tools. I have added two super simple tests - really just the start - but something to build on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.
[ https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768025#comment-13768025 ] Robert Muir commented on LUCENE-5210: - As far as current patch, i dont really have a problem with it (any other simplifications can be done later). I have only one concern: will this make a lucene-tools module (e.g. packaged in releases, published in maven?) It seems like it might, which separately might be a good idea so someone can use the stuff in this folder in their own project, except a few things would be off as far as packaging: * it should probably be restructured, so that various configs used by the build are in src/resources and put inside its jar file (e.g. forbiddenApis configs and so on) * I think this depends on ant, but there is no dependency of ant in the ivy.xml * it would need maven configuration and so on, added in smoketester, etc. * there might be other exclusions for tools/ in the build that are not appropriate, etc. * as far as the name, maybe build-tools would be a better one (since its not tools for working on lucene indexes). If smoketester passes though, I am happy: We can just make sure its excluded from the right places and not doing something we don't want wrt packaging for now, and discuss this stuff on other issues. Unit tests for LicenseCheckTask. Key: LUCENE-5210 URL: https://issues.apache.org/jira/browse/LUCENE-5210 Project: Lucene - Core Issue Type: Test Components: general/build Reporter: Mark Miller Attachments: LUCENE-5210.patch, LUCENE-5210.patch While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a second class citizen - excluded from UI src folder setup and with no units tests. This was a little scary to me. I've started adding some units tests. So far I have mainly just done the lifting of getting units tests to work as part of tools. I have added two super simple tests - really just the start - but something to build on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.
[ https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768033#comment-13768033 ] Mark Miller commented on LUCENE-5210: - Also note, prepare-release does not currently pass the way things are - something to do with a maven artifact that now tries to run on tools. Unit tests for LicenseCheckTask. Key: LUCENE-5210 URL: https://issues.apache.org/jira/browse/LUCENE-5210 Project: Lucene - Core Issue Type: Test Components: general/build Reporter: Mark Miller Attachments: LUCENE-5210.patch, LUCENE-5210.patch While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a second class citizen - excluded from UI src folder setup and with no units tests. This was a little scary to me. I've started adding some units tests. So far I have mainly just done the lifting of getting units tests to work as part of tools. I have added two super simple tests - really just the start - but something to build on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.
[ https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768032#comment-13768032 ] Mark Miller commented on LUCENE-5210: - Yeah, my first thought was to write out the test files to a tmp dir, but essentially I was too lazy to code it up. Unit tests for LicenseCheckTask. Key: LUCENE-5210 URL: https://issues.apache.org/jira/browse/LUCENE-5210 Project: Lucene - Core Issue Type: Test Components: general/build Reporter: Mark Miller Attachments: LUCENE-5210.patch, LUCENE-5210.patch While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a second class citizen - excluded from UI src folder setup and with no units tests. This was a little scary to me. I've started adding some units tests. So far I have mainly just done the lifting of getting units tests to work as part of tools. I have added two super simple tests - really just the start - but something to build on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.
[ https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768034#comment-13768034 ] Robert Muir commented on LUCENE-5210: - I think if we _just want to run tests_ for now, we should change the test target to explicitly recurse to tools, rather than modifying the 'general module macro' in common-build. otherwise other tasks (like packaging, javadocs, maven, etc) will try to do things with tools. Unit tests for LicenseCheckTask. Key: LUCENE-5210 URL: https://issues.apache.org/jira/browse/LUCENE-5210 Project: Lucene - Core Issue Type: Test Components: general/build Reporter: Mark Miller Attachments: LUCENE-5210.patch, LUCENE-5210.patch While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a second class citizen - excluded from UI src folder setup and with no units tests. This was a little scary to me. I've started adding some units tests. So far I have mainly just done the lifting of getting units tests to work as part of tools. I have added two super simple tests - really just the start - but something to build on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.
[ https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768035#comment-13768035 ] Mark Miller commented on LUCENE-5210: - bq. will this make a lucene-tools module (e.g. packaged in releases, published in maven?) Yeah, that is what is happening currently - I'm sure that is what is causing prepare-release to have issues. Unit tests for LicenseCheckTask. Key: LUCENE-5210 URL: https://issues.apache.org/jira/browse/LUCENE-5210 Project: Lucene - Core Issue Type: Test Components: general/build Reporter: Mark Miller Attachments: LUCENE-5210.patch, LUCENE-5210.patch While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a second class citizen - excluded from UI src folder setup and with no units tests. This was a little scary to me. I've started adding some units tests. So far I have mainly just done the lifting of getting units tests to work as part of tools. I have added two super simple tests - really just the start - but something to build on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5210) Unit tests for LicenseCheckTask.
[ https://issues.apache.org/jira/browse/LUCENE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768037#comment-13768037 ] Robert Muir commented on LUCENE-5210: - look at regenerate task in build.xml, it has a subant explicitly going to 'core' to run a task. we'd just want something like that subant, call it someting like 'test-tools' and have 'test' depend on that. Unit tests for LicenseCheckTask. Key: LUCENE-5210 URL: https://issues.apache.org/jira/browse/LUCENE-5210 Project: Lucene - Core Issue Type: Test Components: general/build Reporter: Mark Miller Attachments: LUCENE-5210.patch, LUCENE-5210.patch While working on LUCENE-5209, I noticed the LicenseCheckTask is kind of a second class citizen - excluded from UI src folder setup and with no units tests. This was a little scary to me. I've started adding some units tests. So far I have mainly just done the lifting of getting units tests to work as part of tools. I have added two super simple tests - really just the start - but something to build on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768041#comment-13768041 ] Shai Erera commented on LUCENE-5215: It's ok that SWS contains fieldInfos - FieldsWriter needs to write them. And I don't think it's bad SRS contains fieldInfos, we can just assert in FieldsReader that they are null? Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768049#comment-13768049 ] Robert Muir commented on LUCENE-5215: - Sorry, that would be horribly really confusing. SegmentRead/Write state are for the *data* portions of the codec: postings, vectors, ... they have all metadata available at this point, so it makes sense. However these metadata portions are bootstrapped do not. they only have limited things available. The api should only pass what they actually have access to: no nulls! Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768078#comment-13768078 ] Kranti Parisa commented on SOLR-4787: - I have implemented multi-value keys for hjoin using a new field UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested Joins). I will prepare a patch sometime tomorrow and post here. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / *BitSetJoinQParserPlugin aka bjoin* The bjoin behaves exactly like the hjoin but uses a
[jira] [Comment Edited] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768078#comment-13768078 ] Kranti Parisa edited comment on SOLR-4787 at 9/16/13 5:38 AM: -- I have implemented multi-value keys for hjoin using a new field UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested Joins). I will run some performance tests and prepare the patch sometime tomorrow. was (Author: krantiparisa): I have implemented multi-value keys for hjoin using a new field UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested Joins). I will prepare a patch sometime tomorrow and post here. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser
[jira] [Created] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
Littlestar created LUCENE-5218: -- Summary: background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.trs.hybase.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated LUCENE-5218: --- Description: forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === was: forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.trs.hybase.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt