[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927490#comment-16927490 ] Mikhail Khludnev commented on LUCENE-5189: -- Giving that LUCENE-8585, LUCENE-8374 optimizes for absent values, shouldn't we have an IW method to nuke DV at certain docs? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera >Priority: Major > Fix For: 4.6, 6.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954610#comment-15954610 ] Erick Erickson commented on LUCENE-5189: Yep, total inadvertent reassignment, sent it back to Shai... > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 6.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954500#comment-15954500 ] David Smiley commented on LUCENE-5189: -- [~erickerickson] was this a case of accidental keyboard presses causing JIRA to assign it to you accidentally? This happened to me once and it was very embarrassing so I sympathize and doubt you intended to do this! > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Erick Erickson > Fix For: 4.6, 6.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958205#comment-13958205 ] Mikhail Khludnev commented on LUCENE-5189: -- One more note for adopters. if you start with experimenting with a test, make sure you suppress older codecs (check how a certain DV update test does it), otherwise you can spend some time to digging in really odd exception. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936279#comment-13936279 ] Shai Erera commented on LUCENE-5189: I checked the code and it looks the same with e.g. deleteDocuments(Term) - the Term isn't cloned internally. So your comment pertains to other IW methods. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936265#comment-13936265 ] Mikhail Khludnev commented on LUCENE-5189: -- Just want to leave one caveat for memories. When you call {code}IW.updateNumericDocValue(Term, String, Long){code} make sure that the term is deeply cloned before. Otherwise, if you modify term or bytes, then the modified version will be applied. That's might be a problem. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828524#comment-13828524 ] Shai Erera commented on LUCENE-5189: You're right Simon. The updates are buffered in their raw form in memory until a flush is needed (e.g. commit(), or NRT-open). At that point they are resolved and written to the Directory. This is where it differs from deletes - while deletes are small enough to keep the resolved form in-memory, updates aren't - a single update can affect millions of documents, each takes a long (updated value) ... perhaps future work could be to distinguish between small and large updates, and keep the small updates still in memory. But I believe that will affect a lot more code, e.g. SegReader will now need to be aware of in-memory NDV and on-disk and do a kind of merge between them when an NDV is requested for such field ... it's not going to be pretty-looking code I imagine. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828149#comment-13828149 ] Simon Willnauer commented on LUCENE-5189: - [~mkhludnev] the updates are buffered in memory "just like deletes" and the batched when deletes are flushed to disk / are applied. correct me if I am wrong shai. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828140#comment-13828140 ] Mikhail Khludnev commented on LUCENE-5189: -- [~shaie] let me ask a trivial question. # am I right that it writes to file when I call IW.updateNumericDocValue() ? # and obviously, if it does so, I suppose it will be more efficient to somehow bulk series of updates, and flush it afterwards? # or I'm wrong and there is some implicit flush semantics? thanks! > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812124#comment-13812124 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1538258 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1538258 ] LUCENE-5189: add @Deprecated annotation to SegmentInfo.attributes > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811890#comment-13811890 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1538146 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1538146 ] LUCENE-5189: rename internal API following NumericDocValues updates > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811872#comment-13811872 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1538143 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1538143 ] LUCENE-5189: rename internal API following NumericDocValues updates > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811070#comment-13811070 ] Shai Erera commented on LUCENE-5189: Backported to 4x. I'll handle the "TODO (DVU_RENAME)" tasks (rote renaming of internal API). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-segdv.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811069#comment-13811069 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1537832 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1537832 ] LUCENE-5189: add NumericDocValues field updates > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-segdv.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811057#comment-13811057 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1537829 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1537829 ] LUCENE-5189: add CHANGES > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-segdv.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805658#comment-13805658 ] Michael McCandless commented on LUCENE-5189: +1 to backport. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-segdv.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804640#comment-13804640 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1535526 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1535526 ] LUCENE-5189: factor out SegmentDocValues from SegmentReader > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804246#comment-13804246 ] Shai Erera commented on LUCENE-5189: I think it's ready to backport to 4x. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793580#comment-13793580 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1531620 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1531620 ] LUCENE-5189: add field updates to TestIndexWriterDelete.testNoLostDeletesOrUpdatesOnIOException > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793579#comment-13793579 ] Shai Erera commented on LUCENE-5189: Thanks for the comments Mike, I fixed the code. bq. That's a great catch on forceMerge carrying the wrapped IOExc forward. Why is jenkins not hitting this already...? I think because liveDocs aren't written during forceMerge, i.e. they are carried in RAM? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793521#comment-13793521 ] Michael McCandless commented on LUCENE-5189: The test case changes look good! This confused me: {code} +} else if (!fieldUpdate || random().nextBoolean()) { // sometimes do both deletes and updates {code} Shouldn't that just be its own if (not else if)? Otherwise I think the comment is wrong ... Also I think this: {code} + if (liveDocs != null && liveDocs.get(i)) { {code} should be: {code} + if (liveDocs == null || livedocs.get(i)) { {code} ? That's a great catch on forceMerge carrying the wrapped IOExc forward. Why is jenkins not hitting this already...? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793242#comment-13793242 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1531496 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1531496 ] LUCENE-5189: test unsetting a document's value while the segment is merging > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
yho On Sunday, October 6, 2013, Shai Erera (JIRA) wrote:u > > [ > https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787730#comment-13787730] > > Shai Erera commented on LUCENE-5189: > > > bq. If SR itself doesnt need ref-counting, perhaps we can pull this out of > SR then? (rote-refactor into DV-thingy or something). > > You mean something like SegmentDocValues? It's doable I guess. SR would > need to keep track of the DV.gens it uses though, so that in SR.doClose it > can call segDV.decRef(gens) so that the latter can decRef all the DVPs that > are used for these gens. If it also removes a gen from the map when it's no > longer referenced by any SR, we don't need to take care of clearing the > genDVP map when all SRs were closed (otherwise I think we'll need to > refCount SegDV too, like SCR). > > I'll give it a shot. > > > Numeric DocValues Updates > > - > > > > Key: LUCENE-5189 > > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > > Project: Lucene - Core > > Issue Type: New Feature > > Components: core/index > >Reporter: Shai Erera > >Assignee: Shai Erera > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > > > > In LUCENE-4258 we started to work on incremental field updates, however > the amount of changes are immense and hard to follow/consume. The reason is > that we targeted postings, stored fields, DV etc., all from the get go. > > I'd like to start afresh here, with numeric-dv-field updates only. There > are a couple of reasons to that: > > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > > * It's a fairly contained issue, attempting to handle just one data type > to update, yet requires many changes to core code which will also be useful > for updating other data types. > > * It has value in and on itself, and we don't need to allow updating all > the data types in Lucene at once ... we can do that gradually. > > I have some working patch already which I'll upload next, explaining the > changes. > > > > -- > This message was sent by Atlassian JIRA > (v6.1#6144) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787730#comment-13787730 ] Shai Erera commented on LUCENE-5189: bq. If SR itself doesnt need ref-counting, perhaps we can pull this out of SR then? (rote-refactor into DV-thingy or something). You mean something like SegmentDocValues? It's doable I guess. SR would need to keep track of the DV.gens it uses though, so that in SR.doClose it can call segDV.decRef(gens) so that the latter can decRef all the DVPs that are used for these gens. If it also removes a gen from the map when it's no longer referenced by any SR, we don't need to take care of clearing the genDVP map when all SRs were closed (otherwise I think we'll need to refCount SegDV too, like SCR). I'll give it a shot. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787691#comment-13787691 ] Robert Muir commented on LUCENE-5189: - Thanks shai... I think for me, this is the main one: {quote} SR itself doesn't need any ref-counting, it's the DVPs that need them. Putting them in SCR I think only complicates things (or at least doesn't simplify). {quote} If SR itself doesnt need ref-counting, perhaps we can pull this out of SR then? (rote-refactor into DV-thingy or something). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787688#comment-13787688 ] Shai Erera commented on LUCENE-5189: bq. because there isn't a single DVP that all SRs share let me clarify that -- if you don't use DV updates, you can say that the DVPs for gen=-1 _are_ shared between all the readers. But with updates, gen=-1 may soon be unused by any reader. I prefer for all DVPs management to be in one place than to split it between SCR and SR. Nevertheless, if you think this can be simplified somehow, I'm open to suggestions. I don't want to move them to SCR just for the sake of saying they are in SCR. The code which decides if to reuse a certain DVP or create a new one will still be in SR, and so it makes sense to me that SR manages them. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787683#comment-13787683 ] Shai Erera commented on LUCENE-5189: Let me explain: SCR is still used for all the shared objects between _all_ SRs, objects that are closed when the last of the SRs using this 'core' is closed. With gen'd DVPs, they no longer belong in SCR, because there isn't a single DVP that _all_ SRs share. For example, suppose that you start with an index without updates, then the DV fields gen is -1. So a DVP for field 'f' will be created. Next you update a field and reopen, the gen for 'f' is incremented to 1 and a new DVP is needed. The DVP for gen=-1 is no longer needed. Rob, if we did what you propose - store the initial DVPs in SCR and the updated ones in SRs (as they are updated and reopened), Soon the DVPs in SCR will not be used by any SR, and just hang there (possibly consuming expensive RAM, e.g. MemoryDVF) until the last SR is closed. Rather, DVPs *are shared* between SRs. RefCount is a simple object which keeps ref-counting for an object. When an SR is opened anew (e.g. initializes a new SCR), the DVPs it initializes all have RefCount(1). When an SR is opened by sharing another SR (e.g. NRT, DirReader.openIfChanged), it lists all the fields with DV. If the other SR has a DVP for their dvGen, it reuses it and inc-ref it, otherwise it opens a new DVP with RC(1). SR itself doesn't need any ref-counting, it's the DVPs that need them. Putting them in SCR I think only complicates things (or at least doesn't simplify). For example, currently when SR.doClose is called, it dec-ref all DVPs that it uses. And DocValuesRefCount.release() closes the DVP when its ref-count reaches 0. If we moved all the DVPs to SCR, then SR.doClose would need to go an dec-ref all DVPs that it uses in SCR .. but how does it know which DVP it uses if all DVPs just sit there in SCR - even ones that it doesn't use? I think that that they are in SR actually simplifies and keeps the code clear. Rob, if you don't use DV updates, all DV fields have dvGen=-1 and they are shared between all SRs. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787650#comment-13787650 ] Uwe Schindler commented on LUCENE-5189: --- Hi, to me the whole code in SegmentReader looks like too complicated and contains things that should *not* be in SegmentReader: SegmenReaders are unmodifiabe and final. Why does SegmentReader need to use refcounting? the Refcounting should be in SegmentCoreReaders, SegmentReaders should only have final fields and unmodifiable data structures, please no refcounts!. If a DirectoryReader is reopened you get a new instance of SegmentReader with same corecache key, why is it then having modifiable stuff? Maybe it is just moving stuff to SegmentCoreReaders, but I did so much hard work to keep this class simple when we removed modiications from IndexReaders in 4.x, thanks to Robert for the help! But now its as complicated or more complicated than before (see number of code lines in Lucene 3.6!). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787648#comment-13787648 ] Robert Muir commented on LUCENE-5189: - Or maybe its a simple oversight that that RefCount thing is in SR versus SCR? I dont understand why a SR would need refcounting, since its unmodifiable? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787642#comment-13787642 ] Robert Muir commented on LUCENE-5189: - To get NRT working for docvalues again, I feel like the logic should be something like this in SegmentReader: fooDV(field X) { final Producer producer; if (gen(x) != core.gen(x)) { producer = this.producer; } else { producer = core.producer; } ... } this way, SR only opens up docvalues for ones that have been updated since the SCR was open, but otherwise uses the shared producer as before, and NRT works with docvalues again. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787630#comment-13787630 ] Robert Muir commented on LUCENE-5189: - If someone isn't doing numeric updates, and just using (e.g. memory docvalues), are we really re-loading docvalues for each segmentreader now versus holding it in the segment core? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787576#comment-13787576 ] Shai Erera commented on LUCENE-5189: Committed a fix to SegmentReader (r1529611) to clean unused memory (un-referenced DVPs) as well as don't use an anonymous RefCount class since it resulted in a memory leak, where those anonymous classes held a reference to their SR, therefore we always referenced unused DVPs. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785059#comment-13785059 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1528837 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1528837 ] LUCENE-5189: fix updates-order and docsWithField bugs > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782923#comment-13782923 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1528081 from [~simonw] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1528081 ] LUCENE-5189: s/doApplyAllDeletes/getAndResetApplyAllDeletes/ > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782924#comment-13782924 ] Simon Willnauer commented on LUCENE-5189: - backported the renaming to 4x as well. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782906#comment-13782906 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1528077 from [~simonw] in branch 'dev/trunk' [ https://svn.apache.org/r1528077 ] LUCENE-5189: s/doApplyAllDeletes/getAndResetApplyAllDeletes/ > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782905#comment-13782905 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1528076 from [~simonw] in branch 'dev/trunk' [ https://svn.apache.org/r1528076 ] LUCENE-5189: Check if deletes need to be applied when docvalues are updated and process IW event if needed > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782882#comment-13782882 ] Michael McCandless commented on LUCENE-5189: +1 for getAndResetApplyDeletes ... I don't think it needs a separate issue ... thanks! > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782875#comment-13782875 ] Simon Willnauer commented on LUCENE-5189: - bq. So, basically, if you only call IW.updateNumericDocValue, then we were failing to flush once RAM usage was over the limit? yes bq. Maybe it should be named "getAndClearDoApplyAllDeletes" to make this behavior clear? I think we should change it to "getAndResetApplyDeletes()" I agree. Will do that in a separate commit and open a dedicated issue for it. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782863#comment-13782863 ] Michael McCandless commented on LUCENE-5189: +1 for Simon's patch; good catch. Sort of sneaky that "doApplyAllDeletes" as a side effect clears that bit. Maybe it should be named "getAndClearDoApplyAllDeletes" to make this behavior clear? So, basically, if you only call IW.updateNumericDocValue, then we were failing to flush once RAM usage was over the limit? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782802#comment-13782802 ] Shai Erera commented on LUCENE-5189: Looks good. Thanks Simon. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782782#comment-13782782 ] Shai Erera commented on LUCENE-5189: Thanks Simon, good catch. I think there's no test for it because updates are not lost, they are just not applied based on RAM buffer settings. So e.g. if we reopened a reader, committed or kicked off a merge, they will be applied, right? Nevertheless, we should fix it as I'm seeing this affects my attempts to work on LUCENE-5248, where the updates are only applied when IW.close is called (since I don't reopen, merge or commit). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781991#comment-13781991 ] Michael McCandless commented on LUCENE-5189: bq. I'd not be happy with this being released tomorrow. Can you give more details here? I.e., do you consider the optimizations necessary for release? Or are there other things? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781914#comment-13781914 ] Shai Erera commented on LUCENE-5189: I opened LUCENE-5248 to handle the data structure (irrespective of whether we choose to backport first or not). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781868#comment-13781868 ] Shai Erera commented on LUCENE-5189: I don't mind if we backport the whole thing in one commit. Just thought it will be cleaner to backport each issue's commits. I doubt anyone would "hit" an issue within the couple of hours it will take. But I'll do this in one backport. bq. Only port stuff to the stable branch unless you'd be happy to release it tomorrow I agree, though what if we decide to release 5.0 in one month? Do we revert the whole feature? I just think that it's software, and software always improves. Even if we optimize the way updates are kept (the problem is in ReaderAndLiveDocs), it can always be improved tomorrow even more. That's why the feature is marked @lucene.experimental -- it may not be the most optimized thing, but it works and more importantly - it doesn't affect users that don't use it ("do no harm"). I will look into improving the way updates are kept in RALD (Map>), though honestly, we have no data points as to whether it's efficient or not, or whether the new structure is more efficient. What I think we can do is keep the updates in conceptually an int[] and long[] pair arrays (maybe one of those **Buffer we have for better compression). I'll start w/ that. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781810#comment-13781810 ] Simon Willnauer commented on LUCENE-5189: - bq. I suppose we could also wait some more (the test did uncover a big issue recently), but I don't think we need to... I think we should apply a general rule here that we (at least I believe so) in Lucene land had for a long time. ->> "Only port stuff to the stable branch unless you'd be happy to release it tomorrow." I looked at the change and honestly this looks pretty scary to me. I'd not be happy with this being released tomorrow. We can happily aim for this being in 4.6 but baking in should happen in trunk. This is a huge change and I am thankful for the work that has been done on it but I think we should not rush this into the stable branch. This should IMO bake in more in trunk and have several rounds of optimization to it in terms of datastructures before we go and release it. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781799#comment-13781799 ] Robert Muir commented on LUCENE-5189: - I don't think we should do that: then tests maybe fail, users hit too many open files, hit exceptions when segments dont contain their field, etc, etc in 4.x and we say "sorry, 4.x is only temporarily broken: maybe you should try trunk for a more stable user experience?" > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781797#comment-13781797 ] Shai Erera commented on LUCENE-5189: I wrote (somewhere, don't remember where) that I'd like to backport issue-by-issue. So here I backport the changes done in this issue, after that I'll backport 5215, 5216 and 5246. That way the commits to 4x will be associated with the proper issues. Do you see a problem w/ that strategy? I definitely intend to backport all the changes in one go, only do that in multiple commits, one per issue. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781788#comment-13781788 ] Robert Muir commented on LUCENE-5189: - I'm only confused by the strategy of the backport: I see things like SegmentWriteState.isFieldUpdate here that were fixed in other issues. Is this intentional? If the argument is to get user testing, I think to the user this would actually be "unbaking" if such a term exists...? Can you svn merge the revs to LUCENE-5215 and any other issues (e.g. bugfixes) so we have one coherent backport? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781746#comment-13781746 ] Michael McCandless commented on LUCENE-5189: I think backporting now is OK: we are at the start of the release cycle for 4.6 (so Jenkins will have lots of time to bake the changes for 4.6), the change has baked for ~ 2 weeks in trunk now, perf graphs look fine ( http://people.apache.org/~mikemccand/lucenebench/ ). I agree we can optimize in-RAM data structures, but I don't think that needs to hold up backporting. I suppose we could also wait some more (the test did uncover a big issue recently), but I don't think we need to... > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781731#comment-13781731 ] Shai Erera commented on LUCENE-5189: I don't understand - why can't it live in 4x and be tested by Jenkins as well as users? DocValues went through two releases before they were overhauled, I don't see how this issue is different. It baked in trunk for 2+ weeks, I think it can bake in the 4x branch as well (I believe we have enough time until 4.6 will be released?). Optimizations, which are important, can come next as improvements. For instance, we don't know yet if the data structures used (Maps!) are inefficient, because nobody uses this feature yet - maybe we'll spend a lot of time optimizing while in practice it won't make a big difference? I think that as long as this feature does not affect anyone who doesn't use numeric DV updates, it's safe to backport. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781720#comment-13781720 ] Simon Willnauer commented on LUCENE-5189: - I don't think we should backport this to 4.x yet. I'd want this to bake in further and maybe optimize the memory useage of the internal datastructures as well before this goes into the stable branch. This is a significant change and we should be careful with backporting things like this. can we wait with this a bit more? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780670#comment-13780670 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1527147 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1527147 ] LUCENE-5189: disable merges in testChangeCodec > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772628#comment-13772628 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1524901 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1524901 ] LUCENE-5189: remove leftover > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772623#comment-13772623 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1524900 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1524900 ] LUCENE-5189: fixed concurrency bug > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768274#comment-13768274 ] Shai Erera commented on LUCENE-5189: Jenkins reported this failure, which I'm unable to reproduce with and without the seed (master and child), with iters. {noformat} 1 tests failed. REGRESSION: org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields Error Message: invalid value for doc=351, field=f1 expected:<15> but was:<14> Stack Trace: java.lang.AssertionError: invalid value for doc=351, field=f1 expected:<15> but was:<14> at __randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757) ... Build Log: [...truncated 776 lines...] [junit4] Suite: org.apache.lucene.index.TestNumericDocValuesUpdates [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestNumericDocValuesUpdates -Dtests.method=testManyReopensAndFields -Dtests.seed=5E1E0079E35D52E -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=Etc/GMT-6 -Dtests.file.encoding=US-ASCII [junit4] FAILURE 1.40s J0 | TestNumericDocValuesUpdates.testManyReopensAndFields <<< [junit4]> Throwable #1: java.lang.AssertionError: invalid value for doc=351, field=f1 expected:<15> but was:<14> [junit4]>at __randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0) [junit4]>at org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757) [junit4]>at java.lang.Thread.run(Thread.java:724) [junit4] 2> NOTE: test params are: codec=Asserting, sim=RandomSimilarityProvider(queryNorm=false,coord=no): {}, locale=tr, timezone=Etc/GMT-6 [junit4] 2> NOTE: Linux 3.2.0-53-generic amd64/Oracle Corporation 1.8.0-ea (64-bit)/cpus=8,threads=1,free=66621176,total=210272256 [junit4] 2> NOTE: All tests run in this JVM: [TestSegmentReader, TestStressNRT, TestSort, TestShardSearching, TestEliasFanoSequence, TestBytesRefHash, TestPhrasePrefixQuery, TestLucene45DocValuesFormat, TestFastCompressionMode, TestEliasFanoDocIdSet, TestSearchForDuplicates, TestFixedBitSet, TestIsCurrent, TestFilteredSearch, TestFieldCacheSanityChecker, TestSegmentTermEnum, TestDeletionPolicy, TestSimpleExplanations, TestRegexpRandom, TestIndexCommit, TestCloseableThreadLocal, TestNumericRangeQuery32, TestTwoPhaseCommitTool, TestIndexWriterOnDiskFull, TestPhraseQuery, TestSearchAfter, TestParallelReaderEmptyIndex, TestMaxTermFrequency, TestFlushByRamOrCountsPolicy, TestSimilarity, TestNumericRangeQuery64, TestByteSlices, TestSameScoresWithThreads, TestDocValuesWithThreads, TestMockAnalyzer, TestArrayUtil, TestPostingsOffsets, TestCompressingTermVectorsFormat, TestSentinelIntSet, TestCustomNorms, TestExternalCodecs, TestNumericDocValuesUpdates] [junit4] Completed on J0 in 83.46s, 24 tests, 1 failure <<< FAILURES! {noformat} > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768271#comment-13768271 ] Shai Erera commented on LUCENE-5189: bq. SegmentWriteState flushState; in DWPT is unused +1 to remove it. Indeed it's unused, but because it's package-private, eclipse doesn't complain about it. bq. In DW the `updateNumericDocValue` method is synchronized I followed the other two delete methods. I'm fine with opening a separate issue to remove the synchronization, especially if it's not trivial. bq. I really like the way how this is implemented piggybacking on the delete queue to get a total ordering Thanks, it was very helpful to have deletes already covered like that. I only had to follow their breadcrumbs :). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768204#comment-13768204 ] Simon Willnauer commented on LUCENE-5189: - I only briefly looked at the changed in DW, DWPT, IW & BDS and I have 2 questions: - SegmentWriteState flushState; in DWPT is unused - can we remove it? (I generally want this class to have only final members as well if possible) - In DW the `updateNumericDocValue` method is synchronized - I don't think it needs to. The other two deletes methods don't need to be synced either - maybe we can open another issue to remove the synchronization? It won't be possible to just drop it but it won't be much work. I really like the way how this is implemented piggybacking on the delete queue to get a total ordering :) nice one! > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767999#comment-13767999 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1523525 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1523525 ] LUCENE-5189: add testcase > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767853#comment-13767853 ] Shai Erera commented on LUCENE-5189: Thanks, committed to trunk, revision 1523461. After we resolve all corner issues, and let Jenkins sleep on it for a while, I'll port to 4x. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767852#comment-13767852 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1523461 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1523461 ] LUCENE-5189: add NumericDocValues updates > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767799#comment-13767799 ] Robert Muir commented on LUCENE-5189: - +1 to go to trunk. thanks Shai. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767578#comment-13767578 ] Michael McCandless commented on LUCENE-5189: New patch looks great! The ref counting is very clean. Maybe add a comment that gen is allowed to be -1, just means the field has no DV updates yet, in SegmentReader when building up the map? And then call doClose() in a finally clause if !success so on exception we don't leak open files. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758328#comment-13758328 ] Shai Erera commented on LUCENE-5189: Correct, that's a problem that Rob identified few days ago and it can be solved if we gen FieldInfos, because ReaderAndLiveDocs will detect that case and add a new FieldInfo, as well as create a new gen for this segment's FIS.I have two tests in TestNumericDVUpdates which currently test that this is not supported -- once we gen FIS, we'll change them to assert it is supported. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758311#comment-13758311 ] Shai Erera commented on LUCENE-5189: I think global FIS is an interesting idea, but per-segment FIS.gen is a lower hanging fruit. I did it once and it was quite straightforward (maybe someone will have reservations on how I did it though): * SIS tracks fieldInfosGen (in this patch, rename all dvGen in SIS to fisGen) * FI tracks dvGen * A new FIS45Format reads/writes each FI's dvGen * ReaderAndLiveDocs writes a new FIS gen, containing the entire FIS, so SR only reads the latest gen to load FIS I think we should explore global FIS separately, because it brings its own issues, e.g. do we keep FISFormat or nuke it? Who invokes it (probably SIS)? It's also quite orthogonal to that issue, or at least, we can proceed with it and improve FIS gen'ing later with global FIS. As for SI.attributes(), I think we can move them under SIS. We should open an issue to do that. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758325#comment-13758325 ] Yonik Seeley commented on LUCENE-5189: -- The problem that Mike highlights "some segments might be missing the field entirely so you cannot update those", is pretty bad though. Things work differently (i.e. your update may fail) depending on exactly how segment flushes and merges are done. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758257#comment-13758257 ] Michael McCandless commented on LUCENE-5189: Actually, that would also solve the other problems as well? Ie, the global FieldInfos would be gen'd: on commit we'd write a new FIS file, which all segments in that commit point would use. Any attribute changes to a FieldInfo would be saved, even on update; new fields could be created via update; any segments that have no documents with the field won't be an issue. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758251#comment-13758251 ] Michael McCandless commented on LUCENE-5189: One option, to solve the "some segments might be missing the field entirely so you cannot update those" would be to have the FieldInfos accumulate across segments, i.e. a more global FieldInfos, maybe written to a separate global file (not per segment). This way, if any doc in any segment has added the field, then the global FieldInfos would contain it. Not saying this is an appealing option (there are tons of tradeoffs), but I think it would address that limitation. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758233#comment-13758233 ] Michael McCandless commented on LUCENE-5189: bq. Frankly I am tired of hearing this phrase being used in this way Actually, I think this is a fair use of "progress not perfection". Either that or I don't understand what you're calling "broken APIs" in the current patch. As I understand it, what's "broken" here is that you cannot set the attributes in SegmentInfo nor FieldInfo from your DocValuesFormat writer when it's an update being written: the changes won't be saved. So, I proposed that we document this as a limitation of the SI/FI attributes API: when writing updates, any changes will be lost. For "normal" segment flushes, they work correctly. It'd be a documented limitation, and we can later fix it. I think this situation is very similar to LUCENE-5197, which I would also call "progress not perfection": we are adding a new API (SegmentReader.ramBytesUsed), with an initial implementation that we think might be improved by later cutting over to RamUsageEstimator. But I think we should commit the initial approach (it's useful, it should work well) and later improve the implementation. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757921#comment-13757921 ] Shai Erera commented on LUCENE-5189: bq. It does not solve problem #2 (SegmentInfos.attributes) Correct. So this API is broken today for LiveDocsFormat (since it's the only updateable thing), but field updates only broaden the broken-ness into other formats (now only DVF, but in the future others too). Correct? I think that moving this API into the commit is not an overkill. I remember Mike and I once discussed if we can use that API to save per-segment facets "schema details". I don't remember how this ended, but maybe we shouldn't remove it? Alternatively, we could gen SIFormat too ... that may be an overkill though. Recording per-segment StringStringMap in SIS seems simple enough. Regarding FIS.gen, I honestly thought to keep it simple by writing all FIS entirely in each gen and not complicate the code by writing parts of an FI in different gens and merging them by SR. This is what I plan to do in this issue. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757885#comment-13757885 ] Robert Muir commented on LUCENE-5189: - {quote} Just so I understand, if we gen FieldInfos, does that solve the brokenness of the Codec APIs (in addition to the other things that it solves)? If not, in what way are they broken, and is this break a new thing that NDV updates cause/expose, or it's a break that exists in general? Can you list the breaks here (because I think that FIS.gen solves all the points you raised above). {quote} It does not solve problem #2 (SegmentInfos.attributes). This API should removed, deprecated, made internal-only, or something like that. Another option is to move this stuff into the commit, but that might be overkill: today this stuff is only used as a backwards-compatibility crutch (i think) to read 3.x indexes: so it can possibly be just removed in trunk right now. Gen'ing FieldInfos brings about its own set of questions as far as when/how/if any new fieldinfo information is merged and when/how its visible to the codec API. its very scary but I don't see any alternative at the moment. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757875#comment-13757875 ] Robert Muir commented on LUCENE-5189: - {quote} This would let us proceed (progress not perfection) and then later, we address it. Ie, I think the added boolean is a fair compromise. {quote} Its not a fair compromise at all. To me, as a search engine library, this is not progress. its going backwards. Yes: I'm looking at it solely from an API perspective. Yes: others look at things from only features/performance perspective and do not seem to care about APIs. But as a library, the API is all that matters. So I just want to make it clear: saying "progress not perfection" is not a good excuse for leaving broken APIs about the codebase and shoving in features as fast as possible: its not progress to me so I simply do not see it that way. Frankly I am tired of hearing this phrase being used in this way, and when I see it in the future, it will encourage me to take a closer inspection of APIs and do pickier reviews. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757864#comment-13757864 ] Shai Erera commented on LUCENE-5189: Just so I understand, if we gen FieldInfos, does that solve the brokenness of the Codec APIs (in addition to the other things that it solves)? If not, in what way are they broken, and is this break a new thing that NDV updates cause/expose, or it's a break that exists in general? Can you list the breaks here (because I think that FIS.gen solves all the points you raised above). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757802#comment-13757802 ] Robert Muir commented on LUCENE-5189: - {quote} But I also don't mind moving forward with SWS.isFieldUpdate and remove it in a follow on issue ... as long as it's done before 4.5. {quote} I don't think that will be an issue at all. if we want to iterate and leave the codec APIs broken, I won't object: but simple rule. Trunk only. We can't do this kind of stuff on the stable branch at all: Things that get backported there need to be "ready to ship". > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757688#comment-13757688 ] Shai Erera commented on LUCENE-5189: I think it's important to solve FIS.gen, either on this issue or a separate one, but before 4.5 is out. Because now SegmentInfos records per-field dvGen and if we gen FIS, this will be recorded by a new Lucene45FieldInfosFormat, and SIS will need to record fieldInfosGen. I actually don't mind to do it in this issue. It's work that's needed and affects NDV-updates (e.g. sparse fields which now hit a too late cryptic exception). But I also don't mind moving forward with SWS.isFieldUpdate and remove it in a follow on issue ... as long as it's done before 4.5. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757674#comment-13757674 ] Michael McCandless commented on LUCENE-5189: We could simply document this as a limitation, today? Ie, that if it's an update, the DVFormat cannot use the attributes APIs. This would let us proceed (progress not perfection) and then later, we address it. Ie, I think the added boolean is a fair compromise. Or, we can pursue gen'ing FIS on this patch, but this is going to add a lot of trickiness/complexity; I think it'd be better to explore it separately. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756884#comment-13756884 ] Robert Muir commented on LUCENE-5189: - It really doesn't work: its definitely a blocker for me! This leaves the general api (FieldInfo.attributes and SegmentInfo.attributes) broken for codecs, and only hacks a specific implementation that uses them. With or without the current boolean, if a numeric docvalues impl puts something in FieldInfo.attributes during an update, it will go into a black hole, because FieldInfos is write-once per-segment (and not per-commit). Same goes with SegmentInfo.attributes. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756896#comment-13756896 ] Robert Muir commented on LUCENE-5189: - By the way: the "general" issue is that for updates, its unfortunately not enough to concern ourselves with data, we have to worry about metadata too: I see at least 4 problems (and i have not thought about it completely): # FieldInfo.attributes: these "writes" by the NumericDocValues impl will be completely discarded during update, because its per-segment, not per-commit. # SegmentInfo.attributes: same as the above # Field doesnt exist in FieldInfo at all: (because the segment the update applies to happens to have no values for the field) # Field exists in FieldInfo, but is incomplete: (because the segment the update applies to, had say a stored-only or stored+indexed value for the field, but no dv one). PerFieldDVF is just one implementation that happens to use #1. Fixing it is fixing the symptom, thats why I say we really need to instead fix the disease, or things will get very ugly. The only reasons you dont see more problems with #1 and #2, is that currently its not used very much (only by PerField and back-compat). If we had more codecs exercising the APIs, you would be seeing these problems already. A perfectly good solution would be to remove these APIs completely for public use (which would solve #1 and #2). PerField(PF/DVF) could write its own .per file instead. Back compat cruft could then use these now-internal-only-APIs (and it wont matter since they dont support updates), or we could implement their hacks in another way. But this still leaves issues like #3 and #4. Adding a boolean 'isFieldUpdate' doesn't really solve anything, and it totally breaks the whole concept of the codec being unaware of updates. It is the wrong direction. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756931#comment-13756931 ] Shai Erera commented on LUCENE-5189: OK, so now I get your point. The problem is that we pass to Codec FI.attributes with say an attribute 'foo=bar'. The Codec, unaware that this is an update, looks at the given numericFields and decides to encode them using method "bar2", so it encodes into the attributes 'foo=bar2', but those attributes get lost because they're not rewritten to FIS. Do I understand correctly? Of course, we could say that since the Codec has to peek into SWS.isFieldUpdate, thereby making it updates-aware, it should not encode stuff in a different format, but SWS.isFieldUpdate is not enough to enforce that. I don't think that gen'ing FIS solves the problem of obtaining the right DVF in the first place. Sure, after we do that, the Codec can put whatever attributes that it wants, they will be recorded in the new FIS.gen. But maybe we can solve these two problems by gen'ing FIS: * Add FieldInfo.dvGen. The Codec will receive the FieldInfos with their dvGen bumped up. * Codec can choose to look at FI.dvGen and pull the right DVF e.g. like PerField does. ** Or it can choose to completely ignore it, and always write udpates using the new format. * Codec is free to record whatever attributes it wants on this FI. Since we gen FIS, they will be recorded and used by the reader. What do you think? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756863#comment-13756863 ] Shai Erera commented on LUCENE-5189: I don't understand the problem that you raise. Until then, I think that SWS.isFieldUpdate is fine. It works, it's simple, and most importantly, it allows me to move forward. Let's discuss how to improve it even further, but I don't think this is a blocker. We can always improve that later on. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756678#comment-13756678 ] Robert Muir commented on LUCENE-5189: - {quote} Patch adds per-field support. I currently do that by adding a boolean 'isFieldUpdate' to SegWriteState which is set to true only by ReaderAndLiveDocs. PerFieldDVF then peeks into that boolean and if it's true, it reads the format name from FieldInfo.attributes() instead of relying on Codec.getPerFieldDVF(). If we'll eventually gen FieldInfos, there won't be a need for this boolean as PerFieldDVF will get that from FI.dvGen. {quote} We can't move forward really with this boolean: it only attacks the symptom (puts a HACK in per-field) without fixing the disease (the codec API). In general if a codec needs to write to and read from FieldInfos/SegmentInfos.attributes, it doesnt work here: this api needs to be fixed. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756650#comment-13756650 ] Shai Erera commented on LUCENE-5189: Thanks for the review Mike. I nuked the two unused methodsand I like SegmentCommitInfo, so changed the nocommit text. I changed the nocomit in SIPC to a TODO. Don't think we need to tackle it in this issue. I'm working on improving the RAM accounting. I want to add to NumericUpdate.sizeInBytes() and count it per-update that is actually added. Then, because it's a Map> and the Term and String are both in NumericUdpate already (and will be accounted for in its calculation, only their PTR needs to be taken into account. Also, the sizeInBytes should grow by new entry to outer map only when one is actually added, same for inner map. Therefore I don't think we can have a single constant here, but instead maybe two: for every Entry added to the outer map and every Entry added to the inner map. I think, because I need to compute the shallow sizes only (since Term and String are accounted for in NumericUpdate), it's a single constant for Entry? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756494#comment-13756494 ] Michael McCandless commented on LUCENE-5189: Patch looks great! I review some of the remaining nocommits: {quote} // nocommit no one calls this method, why do we have it? and if we need it, do we need one for docValuesGen too? public void setDelGen(long delGen) { {quote} Nuke it! We only use .advanceNextWriteDelGen (and the patch adds this for DVs too). {quote} // nocommit no one calls this, remove? void clearDelGen() { {quote} Nuke it! bq. class ReadersAndLiveDocs { // nocommit (RENAME) to ReaderAndUpdates? +1 for ReaderAndUpdates {quote} // nocommit why do we do that, vs relying on TrackingDir.getCreatedFiles(), // like we do for updates? {quote} That's a good question ... I'm not sure. We in fact already use TrackingDirWrapper (in ReadersAndLiveDocs.writeLiveDocs)... so we could in theory record those files in SIPC and remove LiveDocsFormat.files(). Maybe make this a TODO though? {quote} // nocommit: review! final static int BYTES_PER_NUMERIC_UPDATE = BYTES_PER_DEL_TERM + 2*RamUsageEstimator.NUM_BYTES_OBJECT_REF + RamUsageEstimator.NUM_BYTES_INT + RamUsageEstimator.NUM_BYTES_LONG; {quote} I think it makes sense to start from BYTES_PER_DEL_TERM, but then instead of mapping to value Integer we map to value Map whose per-Term RAM usage is something like: {noformat} PTR (for LinkedHashMap, since it must link each entry to the next?) Map HEADER PTR (to array) 3 INT 1 FLOAT for each occupied Entry PTR (from Map's entries array) * 2 (overhead for load factor) HEADER PTR * 2 (key, value) String key HEADER INT PTR (to char[]) ARRAY_HEADER + 2 * length-of-string (field) NumericUpdate value HEADER PTR (to Term; ram already accounted for) PTR (to String; ram already accounted for) PTR (to Long value) + HEADER + 8 (long) INT {noformat} The thing is, this is so hairy ... that I think maybe we should instead use RamUsageEstimator to "calibrate" this? Ie, make a standalone test that keeps adding Term + fields into this structure and measures the RAM with RUE? Do this on 32 bit and on 64 bit JVM, and then conditionalize the constants. You'll still need to add in bytes according to field/term lengths... bq. +public class SegmentInfoPerCommit { // nocommit (RENAME) to SegmentCommit? Not sure about that rename ... since this class is just the "metadata" about a commit, not an "actual" commit. Maybe SegmentCommitInfo? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753622#comment-13753622 ] Uwe Schindler commented on LUCENE-5189: --- bq. This callback would need to also get the docid I guess (it's missing in your API example)? Of course we could add this. Java 8 would also support this cool syntax, something like: {{writer.updateDocValues(term, (docid, value) -> value+1);}} The Java 8 example here was just syntactic sugar: For all this its only important that it is an {{interface}} with only one method that gets as many parameters as needed and returns one value. We automatically get the cool java 8 syntax for users, if we design the callback interface to these guidelines. One common example is the "Comparator" interface in Java. Every Comparator can be written in this cool syntax :-) > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753600#comment-13753600 ] Shai Erera commented on LUCENE-5189: I definitely want to add update by query, but in a separate issue. And the callback idea is interesting. This callback would need to also get the docid I guess (it's missing in your API example)? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753586#comment-13753586 ] Uwe Schindler commented on LUCENE-5189: --- Hi, I had an idea yesterday when thinking about this. Currently (like for deletes) we can update DocValues based on an ID term (by docid is not easily possible with IndexWriter). As the ID term can be anything, you could also use some (group) key that updates lots of documents (like you can delete all documents with a specific term). The current code updates the given field for all those documents to a fixed value. My two ideas are: - we could also support update by query (means like for deletes you provide a query that selects the documents to update) - we could make "modifications" possible: Instead of giving a value that is set for all selected documents, we could provide a "callback" interface that is used to modify the current docvalue (numeric or String) of the document to update and returns a changed value. This would be a one-method interface, so it could be used as closure in Java 8, like {{writer.updateDocValues(term, value -> value+1);}} (in Java 6/7 this would be {{writer.updateDocValues(term, new NumericDocValuesUpdater() \{ public long update(long value) \{ return value+1; \}\});}}). Servers like Solr or ElasticSearch could implement this interface/closure using e.g. javascript, so one could execute a docvalues update and pass a javascript function applied to every value. We just have to think about concurency: What happens if 2 threads are updating the same value at the same time - maybe this is already handled by the BufferedDeletesQueue!? I just wanted to write this down in this issue, so we could think about allowing to implement this. Of course the current patch is more important to get the whole game running! The updateable by term/query is just one thing which is often requested by users. The typical example is a webapp where you can vote for a document. In that case one would execute the closure {{value -> value+1}}. If we implement this so low level, the whole concurreny should be easier than how it is currently impelemented e.g. in Solr or ES. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753557#comment-13753557 ] Shai Erera commented on LUCENE-5189: Regarding problem 1, I hardwired the test to use Lucene45Codec for now so that I'm not blocked. I thought about Codec.serlize/attributes and now I realize it's not a good idea since those attributes must be recorded per-segment, yet the Codec is single-instance for all segments. We can however record these in SegmentInfo.attributes(). The documentation suggests this is where the Codec should record stuff per-segment. Would it work if PerFieldDVF recorded the per-field-format in SegWriteStage.si.attributes() and read them later, instead of FieldInfo.attributes? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753500#comment-13753500 ] Shai Erera commented on LUCENE-5189: Regarding problem 3, Mike helped me construct a simple test which reproduces the bug - I opened LUCENE-5192 to fix. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753482#comment-13753482 ] Shai Erera commented on LUCENE-5189: BTW, this may generally not be a bad idea, to let the Codec serialize some stuff which is later given to it in Codec.init(BytesRef). E.g. if a Codec is initialized with some parameters that are also important during search (e.g FacetsCodec can be initialized with FacetIndexingParams, which get lost during search because the Codec is initialized with default ctor), this could be a way for it to serialize/deserialize itself. The name will be used for the newInstance(), the rest to initialize the Codec. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753476#comment-13753476 ] Shai Erera commented on LUCENE-5189: Regarding problem 1, I don't know if it's a valid solution, but maybe if we recorded a per-field-format map for each SegInfo, Lucene45Codec could initialize its dvFormat accordingly? This is not generic though .. it's like we need to have a Codec.serialize() method which dumps stuff to SegInfo (or returns a BytesRef/String from which it can later initialize itself). We'd then not need the attributes on FieldInfo. We have to somehow employ the same logic as we do in PerFieldDVF.FieldsReader, in PerFieldDVF.FieldsWriter for updating existing segments. Whatever solution we'll do here, will help us when we come to implement field updates for postings. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753391#comment-13753391 ] Robert Muir commented on LUCENE-5189: - In the case of old codecs: what we do is pretty tricky for testing: * we make them read-only officially for the user (so that new segments are written in the latest format, but old segments can still be read). * this has the additional caveat they are not "purely" read-only, because actually we allow liveDocs updates (deletes) against the old formats. so they are "mostly read-only". * tests have read-write versions (like in branch4x: PreFlexRWCodec). These allow in tests for us to override the read-only-ness, and write like the old formats did and read them in transparently in tests. * Of course they cannot support the newest features with this "impersonator" testing we do, but in general we get a lot more test coverage than if we relied solely upon TestBackwardsCompatibility. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753312#comment-13753312 ] Shai Erera commented on LUCENE-5189: Thanks Rob. I forgot about SuppressCodecs :). I guess I was confused by why Lucene40 was picked in the first place, as I thought we don't test writing indexes with old Codecs? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752977#comment-13752977 ] Robert Muir commented on LUCENE-5189: - You can add \@SuppressCodecs(\{"Lucene40", "SomethingElse", ...\}) annotation to the top of your test for this. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752462#comment-13752462 ] Robert Muir commented on LUCENE-5189: - I think its true we can tackle this in a separate issue: but I think i'd rather have SR/SCR just pass the correct directory always in the segmentreadstate/segmentwritestate to the different codec components (e.g. segmentreadstate.dir is always the 'correct' directory the codec component should use, and even when CFS is enabled, livedocsformat always gets the inner one and so on). Its ok if we want to have the 'inner dir' accessible in SegmentInfo for SR/SCR to do this: like we could make it package private and everything would then just work? This would greatly reduce the possibility of mistakes. I think having CFS fall back on its inner directory on FileNotFoundException is less desirable. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752456#comment-13752456 ] Shai Erera commented on LUCENE-5189: The problem is that there are two Directories and the logic of where the file is read from depends if it's gen'd or not (so far it has been only livedocs). Maybe what we can do is have CFS revert to directory.openInput if file does not exist? We can do that in a separate issue? If we fix that, then I think we might really be able to "hide" the gen from the Codec cleanly. Actually, if the fix is that simple (CFD.openInput reverting to dir.openInput), I can do it as part of this issue, it's small enough? > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch, LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752438#comment-13752438 ] Robert Muir commented on LUCENE-5189: - by the way if we want to fix the state.directory vs state.segmentInfo.dir i would love to help with this, its bothered me forever. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752436#comment-13752436 ] Robert Muir commented on LUCENE-5189: - {quote} Maybe for writing it can work, but the producer needs to know from which Directory to read the file, e.g. if it's CFS, the gen'd files are written outside. {quote} Wait... we shouldnt design around this bug though (and it is an api bug). this problem you point out is definitely existing bogusness: I think we should fix this instead so a codec gets a single "directory" and doesnt need to know or care what its impl is, and whether its got TrackingDirectoryWrapper or CFS or whatever around it. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752424#comment-13752424 ] Shai Erera commented on LUCENE-5189: bq. Can we refer to this consistently as docValuesGen Yes, I think that makes sense. At some point I supported this by gen'ing FieldInfos hence the name, but things have changed since. I'll rename. bq. Maybe we shouldnt pass this parameter to the codec at all. Instead IndexWriter can just put this into the segment suffix and the codec can be blissfully unaware? Maybe for writing it can work, but the producer needs to know from which Directory to read the file, e.g. if it's CFS, the gen'd files are written outside. I have this code in Lucene45DVProducer: {code} final Directory dir; if (fieldInfosGen != -1) { dir = state.segmentInfo.dir; // gen'd files are written outside CFS, so use SegInfo directory } else { dir = state.directory; } {code} I think that if we want to mask that away from the Codec entirely, we should somehow tell the Codec the segmentSuffix and the Directory from which to read the file. Would another Directory parameter be confusing (since we also have it in SegReadState)? bq. I hope we can do this in a cleaner way than 3.x did it for setNorm, that was really crazy Well ... I don't really know how setNorm worked in 3.x, so I'll do what I think and you tell me if it's crazy or not? :) > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752419#comment-13752419 ] Robert Muir commented on LUCENE-5189: - {quote} That way you can end up with e.g. _0_Lucene45_0_1.dvd and *.dvm for field 'f' ... I put a nocommit in DVFormat.fieldsConsumer/Producer by adding another variant which takes fieldInfosGen. ... I want to have only one variant of that method, thereby breaking the API. This is important IMO cause we need to ensure that whatever custom DVFormats out there pay attention to the new fieldInfosGen parameter, or otherwise they might overwrite previously created files. {quote} Maybe we shouldnt pass this parameter to the codec at all. Instead IndexWriter can just put this into the segment suffix and the codec can be blissfully unaware? {quote} SegmentCoreReaders no longer has a single DVConsumer it uses, but rather per field it uses the appropriate DVConsumer (depends on the 'gen'). I put a nocommit to remove DVConsumers from SegCoreReaders into a RefCount'd object in SegmentReader so that we can keep SegCoreReaders manage the 'readers' that are shared between all SegReaders, and also make sure to reuse same DVConsumer by multiple SegReaders. I'll do that later. {quote} I hope we can do this in a cleaner way than 3.x did it for setNorm, that was really crazy :) {quote} I put a nocommit in DVFormat.fieldsConsumer/Producer by adding another variant which takes fieldInfosGen. {quote} Can we refer to this consistently as docValuesGen or something else (I see the patch already does this in some places, but other places its called fieldInfosGen). I dont think this should ever be referred to as fieldInfosGen because, its not a generation for the fieldinfos data, and that would be horribly scary! > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752409#comment-13752409 ] Shai Erera commented on LUCENE-5189: I forgot to mention -- this work started from a simple patch Rob sent me few weeks, so thanks Rob! And also, thanks Mike for helping me get around Lucene core code! At times I felt like I'm literally hammering through the code to get the thing working ;). > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org