[jira] [Updated] (LUCENE-8310) Relax IWs check on pending deletes
[ https://issues.apache.org/jira/browse/LUCENE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8310: Attachment: LUCENE-8310.patch > Relax IWs check on pending deletes > -- > > Key: LUCENE-8310 > URL: https://issues.apache.org/jira/browse/LUCENE-8310 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8310.patch, LUCENE-8310.patch > > > I recently fixed the check in IW to fail if there are pending deletes. After > upgrading to a snapshot I realized the consequences of this check. It made > most of our usage of IW to for instance prepare commit metadata, rollback to > safe commit-points etc. impossible since we have to now busy wait on top of > directory util all deletes are actually gone even though that we can > guarantee that our history always goes forward. ie we are truly append-only > in the sense of never reusing segment generations. The fix that I made was > basically return false from a _checkPendingDeletions_ in a directory wrapper > to work around it. > I do expect this to happen to a lot of lucene users even if they use IW > correctly. My proposal is to make the check in IW a bit more sophisticated > and only fail if there are pending deletes that are in the future from a > generation perspective. We really don't care about files from the past. My > patch checks the segment generation of each pending file which is safe since > that is the same procedure we apply in IndexFileDeleter to inc reference etc. > and only fail if the pending delete is in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8310) Relax IWs check on pending deletes
[ https://issues.apache.org/jira/browse/LUCENE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8310: Attachment: LUCENE-8310.patch > Relax IWs check on pending deletes > -- > > Key: LUCENE-8310 > URL: https://issues.apache.org/jira/browse/LUCENE-8310 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8310.patch > > > I recently fixed the check in IW to fail if there are pending deletes. After > upgrading to a snapshot I realized the consequences of this check. It made > most of our usage of IW to for instance prepare commit metadata, rollback to > safe commit-points etc. impossible since we have to now busy wait on top of > directory util all deletes are actually gone even though that we can > guarantee that our history always goes forward. ie we are truly append-only > in the sense of never reusing segment generations. The fix that I made was > basically return false from a _checkPendingDeletions_ in a directory wrapper > to work around it. > I do expect this to happen to a lot of lucene users even if they use IW > correctly. My proposal is to make the check in IW a bit more sophisticated > and only fail if there are pending deletes that are in the future from a > generation perspective. We really don't care about files from the past. My > patch checks the segment generation of each pending file which is safe since > that is the same procedure we apply in IndexFileDeleter to inc reference etc. > and only fail if the pending delete is in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8310) Relax IWs check on pending deletes
[ https://issues.apache.org/jira/browse/LUCENE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475640#comment-16475640 ] Simon Willnauer commented on LUCENE-8310: - here is a patch > Relax IWs check on pending deletes > -- > > Key: LUCENE-8310 > URL: https://issues.apache.org/jira/browse/LUCENE-8310 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8310.patch > > > I recently fixed the check in IW to fail if there are pending deletes. After > upgrading to a snapshot I realized the consequences of this check. It made > most of our usage of IW to for instance prepare commit metadata, rollback to > safe commit-points etc. impossible since we have to now busy wait on top of > directory util all deletes are actually gone even though that we can > guarantee that our history always goes forward. ie we are truly append-only > in the sense of never reusing segment generations. The fix that I made was > basically return false from a _checkPendingDeletions_ in a directory wrapper > to work around it. > I do expect this to happen to a lot of lucene users even if they use IW > correctly. My proposal is to make the check in IW a bit more sophisticated > and only fail if there are pending deletes that are in the future from a > generation perspective. We really don't care about files from the past. My > patch checks the segment generation of each pending file which is safe since > that is the same procedure we apply in IndexFileDeleter to inc reference etc. > and only fail if the pending delete is in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8310) Relax IWs check on pending deletes
Simon Willnauer created LUCENE-8310: --- Summary: Relax IWs check on pending deletes Key: LUCENE-8310 URL: https://issues.apache.org/jira/browse/LUCENE-8310 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) I recently fixed the check in IW to fail if there are pending deletes. After upgrading to a snapshot I realized the consequences of this check. It made most of our usage of IW to for instance prepare commit metadata, rollback to safe commit-points etc. impossible since we have to now busy wait on top of directory util all deletes are actually gone even though that we can guarantee that our history always goes forward. ie we are truly append-only in the sense of never reusing segment generations. The fix that I made was basically return false from a _checkPendingDeletions_ in a directory wrapper to work around it. I do expect this to happen to a lot of lucene users even if they use IW correctly. My proposal is to make the check in IW a bit more sophisticated and only fail if there are pending deletes that are in the future from a generation perspective. We really don't care about files from the past. My patch checks the segment generation of each pending file which is safe since that is the same procedure we apply in IndexFileDeleter to inc reference etc. and only fail if the pending delete is in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8264) Allow an option to rewrite all segments
[ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474131#comment-16474131 ] Simon Willnauer edited comment on LUCENE-8264 at 5/14/18 12:46 PM: --- [~erickerickson] # For N-1 -> N we have _org.apache.lucene.index.UpgradeIndexMergePolicy_ ? # In order to add DV I think this should be done by wrapping a codec reader. I personally think quite an edge case and should be done in the higher level application ie. Solr itself. You can do this quite easily with _org.apache.lucene.index.OneMergeWrappingMergePolicy_ similar to what we do in the soft delete case in _SoftDeletesRetentionMergePolicy_ do I miss something? was (Author: simonw): [~erickerickson] # For N-1 -> N we have _org.apache.lucene.index.UpgradeIndexMergePolicy_ ? # In order to add DV I think this should be done by wrapping a codec reader. I personally think quite an edge case and should be done in the higher level application ie. Solr itself. You can do this quite easily with _org.apache.lucene.index.OneMergeWrappingMergePolicy_ similar to what we do in the soft delete case in _SoftDeletesRetentionMergePolicy_ do I miss something? > Allow an option to rewrite all segments > --- > > Key: LUCENE-8264 > URL: https://issues.apache.org/jira/browse/LUCENE-8264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > For the background, see SOLR-12259. > There are several use-cases that would be much easier, especially during > upgrades, if we could specify that all segments get rewritten. > One example: Upgrading 5x->6x->7x. When segments are merged, they're > rewritten into the current format. However, there's no guarantee that a > particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily > be successful. > How many merge policies support this is an open question. I propose to start > with TMP and raise other JIRAs as necessary for other merge policies. > So far the usual response has been "re-index from scratch", but that's > increasingly difficult as systems get larger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments
[ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474131#comment-16474131 ] Simon Willnauer commented on LUCENE-8264: - [~erickerickson] # For N-1 -> N we have _org.apache.lucene.index.UpgradeIndexMergePolicy_ ? # In order to add DV I think this should be done by wrapping a codec reader. I personally think quite an edge case and should be done in the higher level application ie. Solr itself. You can do this quite easily with _org.apache.lucene.index.OneMergeWrappingMergePolicy_ similar to what we do in the soft delete case in _SoftDeletesRetentionMergePolicy_ do I miss something? > Allow an option to rewrite all segments > --- > > Key: LUCENE-8264 > URL: https://issues.apache.org/jira/browse/LUCENE-8264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > For the background, see SOLR-12259. > There are several use-cases that would be much easier, especially during > upgrades, if we could specify that all segments get rewritten. > One example: Upgrading 5x->6x->7x. When segments are merged, they're > rewritten into the current format. However, there's no guarantee that a > particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily > be successful. > How many merge policies support this is an open question. I propose to start > with TMP and raise other JIRAs as necessary for other merge policies. > So far the usual response has been "re-index from scratch", but that's > increasingly difficult as systems get larger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8307) FileSwitchDirectory.checkPendingDeletions is backward
[ https://issues.apache.org/jira/browse/LUCENE-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474017#comment-16474017 ] Simon Willnauer commented on LUCENE-8307: - LGTM > FileSwitchDirectory.checkPendingDeletions is backward > - > > Key: LUCENE-8307 > URL: https://issues.apache.org/jira/browse/LUCENE-8307 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8307.patch > > > It checks that both directories have pending deletions, while this method > should return true if there are any files pending deletion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8298. - Resolution: Fixed > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, > LUCENE-8298.patch, LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469027#comment-16469027 ] Simon Willnauer commented on LUCENE-8298: - > I'd rather like not to add {{Bits#getMutableCopy}} and keep the {{Bits}} API > minimal. Otherwise +1. fair enough. I agree lets keep it clean. I used a static method on FixedBitset instead. > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) >Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, > LUCENE-8298.patch, LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8298: Attachment: LUCENE-8298.patch > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, > LUCENE-8298.patch, LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468879#comment-16468879 ] Simon Willnauer commented on LUCENE-8298: - [~jpountz] I integrated with your latest changes > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, > LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8298: Attachment: LUCENE-8298.patch > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, > LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8303) Make LiveDocsFormat only responsible for (de)serialization of live docs
[ https://issues.apache.org/jira/browse/LUCENE-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468844#comment-16468844 ] Simon Willnauer commented on LUCENE-8303: - +1 this looks great. I am unsure if we should deprecate the MutableBits in 7.x but other than that go ahead and push. > Make LiveDocsFormat only responsible for (de)serialization of live docs > --- > > Key: LUCENE-8303 > URL: https://issues.apache.org/jira/browse/LUCENE-8303 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8303.patch > > > We could simplify live docs by only making the format responsible from > reading/writing a Bits instance that represents live docs while today the > format is also involved to delete documents since it needs to be able to > provide mutable bits. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8298: Attachment: LUCENE-8298.patch > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468781#comment-16468781 ] Simon Willnauer commented on LUCENE-8298: - I updated the patch [~jpountz] > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8296) PendingDeletes shouldn't write to live docs that it shared
[ https://issues.apache.org/jira/browse/LUCENE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468516#comment-16468516 ] Simon Willnauer commented on LUCENE-8296: - cool LGTM +1 to commit > PendingDeletes shouldn't write to live docs that it shared > -- > > Key: LUCENE-8296 > URL: https://issues.apache.org/jira/browse/LUCENE-8296 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8296.patch > > > PendingDeletes has a markAsShared mechanism that allow to make sure that the > latest livedocs are not going to receive more updates. But it is not always > used, and I was able to verify that in some cases we end up with readers > whose live docs disagree with the number of deletes. Even though this might > not be causing bugs, it feels dangerous to me so I think we should consider > always marking live docs as shared in #getLiveDocs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467166#comment-16467166 ] Simon Willnauer commented on LUCENE-8298: - new patch with added javadocs, API cleanups and more tests > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8298: Attachment: LUCENE-8298.patch > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch, LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465964#comment-16465964 ] Simon Willnauer commented on LUCENE-8298: - I attached a patch for discussion. I need to do some cleanups, more tests and clarify javadocs but it shows the idea > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value
[ https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8298: Attachment: LUCENE-8298.patch > Allow DocValues updates to reset a value > > > Key: LUCENE-8298 > URL: https://issues.apache.org/jira/browse/LUCENE-8298 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8298.patch > > > Today once a document has a value in a certain DV field this values > can only be changed but not removed. While resetting / removing a value > from a field is certainly a corner case it can be used to undelete a > soft-deleted document unless it's merged away. > This allows to rollback changes without rolling back to another > commitpoint > or trashing all uncommitted changes. In certain cenarios it can be used to > "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8298) Allow DocValues updates to reset a value
Simon Willnauer created LUCENE-8298: --- Summary: Allow DocValues updates to reset a value Key: LUCENE-8298 URL: https://issues.apache.org/jira/browse/LUCENE-8298 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) Today once a document has a value in a certain DV field this values can only be changed but not removed. While resetting / removing a value from a field is certainly a corner case it can be used to undelete a soft-deleted document unless it's merged away. This allows to rollback changes without rolling back to another commitpoint or trashing all uncommitted changes. In certain cenarios it can be used to "repair" history of documents in distributed systems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8297) Add IW#tryUpdateDocValues(Reader, int, Fields...)
[ https://issues.apache.org/jira/browse/LUCENE-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8297. - Resolution: Fixed > Add IW#tryUpdateDocValues(Reader, int, Fields...) > - > > Key: LUCENE-8297 > URL: https://issues.apache.org/jira/browse/LUCENE-8297 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8297.patch > > > IndexWriter can update doc values for a specific term but this might > affect all documents containing the term. With tryUpdateDocValues > users can update doc-values fields for individual documents. This allows > for instance to soft-delete individual documents. > The new method shares most of it's code with tryDeleteDocuments. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments
[ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465852#comment-16465852 ] Simon Willnauer commented on LUCENE-7976: - > [~erickerickson] if you index with a single thread, and `commit()` at the >right times you can build a precise set of segments and then directly test >TMP's behavior. I like approach one since it then gives you full >deterministic control to enumerate the different tricky cases that surface in >real indices? I really think we should start working towards testing this as real unittest. Creating stuff with IW and depending on it is a big issue. We can change the code to be less dependent on IW. I think we should and we should do it before making significant changes to MPs IMO > Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of > very large segments > - > > Key: LUCENE-7976 > URL: https://issues.apache.org/jira/browse/LUCENE-7976 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, > LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch > > > We're seeing situations "in the wild" where there are very large indexes (on > disk) handled quite easily in a single Lucene index. This is particularly > true as features like docValues move data into MMapDirectory space. The > current TMP algorithm allows on the order of 50% deleted documents as per a > dev list conversation with Mike McCandless (and his blog here: > https://www.elastic.co/blog/lucenes-handling-of-deleted-documents). > Especially in the current era of very large indexes in aggregate, (think many > TB) solutions like "you need to distribute your collection over more shards" > become very costly. Additionally, the tempting "optimize" button exacerbates > the issue since once you form, say, a 100G segment (by > optimizing/forceMerging) it is not eligible for merging until 97.5G of the > docs in it are deleted (current default 5G max segment size). > The proposal here would be to add a new parameter to TMP, something like > (no, that's not serious name, suggestions > welcome) which would default to 100 (or the same behavior we have now). > So if I set this parameter to, say, 20%, and the max segment size stays at > 5G, the following would happen when segments were selected for merging: > > any segment with > 20% deleted documents would be merged or rewritten NO > > MATTER HOW LARGE. There are two cases, > >> the segment has < 5G "live" docs. In that case it would be merged with > >> smaller segments to bring the resulting segment up to 5G. If no smaller > >> segments exist, it would just be rewritten > >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). > >> It would be rewritten into a single segment removing all deleted docs no > >> matter how big it is to start. The 100G example above would be rewritten > >> to an 80G segment for instance. > Of course this would lead to potentially much more I/O which is why the > default would be the same behavior we see now. As it stands now, though, > there's no way to recover from an optimize/forceMerge except to re-index from > scratch. We routinely see 200G-300G Lucene indexes at this point "in the > wild" with 10s of shards replicated 3 or more times. And that doesn't even > include having these over HDFS. > Alternatives welcome! Something like the above seems minimally invasive. A > new merge policy is certainly an alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8297) Add IW#tryUpdateDocValues(Reader, int, Fields...)
[ https://issues.apache.org/jira/browse/LUCENE-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465206#comment-16465206 ] Simon Willnauer commented on LUCENE-8297: - [~mikemccand] can you take a look > Add IW#tryUpdateDocValues(Reader, int, Fields...) > - > > Key: LUCENE-8297 > URL: https://issues.apache.org/jira/browse/LUCENE-8297 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8297.patch > > > IndexWriter can update doc values for a specific term but this might > affect all documents containing the term. With tryUpdateDocValues > users can update doc-values fields for individual documents. This allows > for instance to soft-delete individual documents. > The new method shares most of it's code with tryDeleteDocuments. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8297) Add IW#tryUpdateDocValues(Reader, int, Fields...)
[ https://issues.apache.org/jira/browse/LUCENE-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8297: Attachment: LUCENE-8297.patch > Add IW#tryUpdateDocValues(Reader, int, Fields...) > - > > Key: LUCENE-8297 > URL: https://issues.apache.org/jira/browse/LUCENE-8297 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8297.patch > > > IndexWriter can update doc values for a specific term but this might > affect all documents containing the term. With tryUpdateDocValues > users can update doc-values fields for individual documents. This allows > for instance to soft-delete individual documents. > The new method shares most of it's code with tryDeleteDocuments. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8297) Add IW#tryUpdateDocValues(Reader, int, Fields...)
Simon Willnauer created LUCENE-8297: --- Summary: Add IW#tryUpdateDocValues(Reader, int, Fields...) Key: LUCENE-8297 URL: https://issues.apache.org/jira/browse/LUCENE-8297 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) IndexWriter can update doc values for a specific term but this might affect all documents containing the term. With tryUpdateDocValues users can update doc-values fields for individual documents. This allows for instance to soft-delete individual documents. The new method shares most of it's code with tryDeleteDocuments. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8296) PendingDeletes shouldn't write to live docs that it shared
[ https://issues.apache.org/jira/browse/LUCENE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465190#comment-16465190 ] Simon Willnauer commented on LUCENE-8296: - I think this is mostly a relict from before I started refactoring ReadersAndUpdates. I would love to even go further and down the road make the returned Bits instance immutable. I think we should have a very very simple base class that FixedBitSet can extend that knows how to read from the array. This way we know nobody ever mutates it. Today you can just cast the liveDocs from a NRT reader and change it's private instance. I am going to look into this unless anybody beats me. One thing that I am feel is missing is an explicit test that the returned bits don't change in subsequent modifications. +1 to the change! > PendingDeletes shouldn't write to live docs that it shared > -- > > Key: LUCENE-8296 > URL: https://issues.apache.org/jira/browse/LUCENE-8296 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8296.patch > > > PendingDeletes has a markAsShared mechanism that allow to make sure that the > latest livedocs are not going to receive more updates. But it is not always > used, and I was able to verify that in some cases we end up with readers > whose live docs disagree with the number of deletes. Even though this might > not be causing bugs, it feels dangerous to me so I think we should consider > always marking live docs as shared in #getLiveDocs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments
[ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463704#comment-16463704 ] Simon Willnauer edited comment on LUCENE-7976 at 5/4/18 11:31 AM: -- Eric thanks for tackling this big issue here! here are a couple comments: * please remove the commented part that refers to // TODO: See LUCENE-8263 * Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_ * can you make _SegmentSizeAndDocs_ static and maybe a simple struct ie. no getters and don't pass IW to it * can we assert that _int totalMaxDocs_ is always positive. I know we don't allow that many documents in an index but I think it would be good to have an extra check. * can we name _maxMergeAtOnceThisMerge_ _currentMaxMergeAtOnce_ or maybe just _maxMergeAtOnce_ I got down this quite a bit and I am starting to question if we should really try to change the algorithm that we have today or if this class needs cleanup and refactorings first. I am sorry to come in late here but this is a very very complex piece of code and adding more complexity to it will rather do harm. That said, I wonder if we can generalize the algorithm here into a single method because in the end they all do the same thing. We can for instance make the selection alg pluggable with a func we pass in and that way differentiate between findMerges and findForceMerge etc. At the end of the day we want them all to work in the same way. I am not saying we should go down all that way but maybe we can extract a common code path that we can share between the places were we filter out the segments that are not eligible. This is just a suggestion, I am happy to help here btw. One thing that concerns me and is in-fact a showstopper IMO is that the patch doesn't have a single test that ensures it's correct. I mean we significantly change the behavior I think it warrants tests no? was (Author: simonw): Eric thanks for tackling this big issue here! here are a couple comments: * please remove the commented part that refers to // TODO: See LUCENE-8263 * Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_ * can you make _SegmentSizeAndDocs_ static and maybe a simple struct ie. no getters and don't pass IW to it * can we assert that _int totalMaxDocs_ is always positive. I know we don't allow that many documents in an index but I think it would be good to have an extra check. * can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just _ maxMergeAtOnce_ I got down this quite a bit and I am starting to question if we should really try to change the algorithm that we have today or if this class needs cleanup and refactorings first. I am sorry to come in late here but this is a very very complex piece of code and adding more complexity to it will rather do harm. That said, I wonder if we can generalize the algorithm here into a single method because in the end they all do the same thing. We can for instance make the selection alg pluggable with a func we pass in and that way differentiate between findMerges and findForceMerge etc. At the end of the day we want them all to work in the same way. I am not saying we should go down all that way but maybe we can extract a common code path that we can share between the places were we filter out the segments that are not eligible. This is just a suggestion, I am happy to help here btw. One thing that concerns me and is in-fact a showstopper IMO is that the patch doesn't have a single test that ensures it's correct. I mean we significantly change the behavior I think it warrants tests no? > Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of > very large segments > - > > Key: LUCENE-7976 > URL: https://issues.apache.org/jira/browse/LUCENE-7976 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, > LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch > > > We're seeing situations "in the wild" where there are very large indexes (on > disk) handled quite easily in a single Lucene index. This is particularly > true as features like docValues move data into MMapDirectory space. The > current TMP algorithm allows on the order of 50% deleted documents as per a > dev list conversation with Mike McCandless (and his blog here: > https://www.elastic.co/blog/lucenes-handling-of-deleted-documents). > Especially in the current era of very large indexes in aggregate, (think many &
[jira] [Comment Edited] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments
[ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463704#comment-16463704 ] Simon Willnauer edited comment on LUCENE-7976 at 5/4/18 11:30 AM: -- Eric thanks for tackling this big issue here! here are a couple comments: * please remove the commented part that refers to // TODO: See LUCENE-8263 * Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_ * can you make _SegmentSizeAndDocs_ static and maybe a simple struct ie. no getters and don't pass IW to it * can we assert that _int totalMaxDocs_ is always positive. I know we don't allow that many documents in an index but I think it would be good to have an extra check. * can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just _ maxMergeAtOnce_ I got down this quite a bit and I am starting to question if we should really try to change the algorithm that we have today or if this class needs cleanup and refactorings first. I am sorry to come in late here but this is a very very complex piece of code and adding more complexity to it will rather do harm. That said, I wonder if we can generalize the algorithm here into a single method because in the end they all do the same thing. We can for instance make the selection alg pluggable with a func we pass in and that way differentiate between findMerges and findForceMerge etc. At the end of the day we want them all to work in the same way. I am not saying we should go down all that way but maybe we can extract a common code path that we can share between the places were we filter out the segments that are not eligible. This is just a suggestion, I am happy to help here btw. One thing that concerns me and is in-fact a showstopper IMO is that the patch doesn't have a single test that ensures it's correct. I mean we significantly change the behavior I think it warrants tests no? was (Author: simonw): here are a couple comments: * please remove the commented part that refers to // TODO: See LUCENE-8263 * Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_ * can you make _SegmentSizeAndDocs_ static and maybe a simple struct ie. no getters and don't pass IW to it * can we assert that _int totalMaxDocs_ is always positive. I know we don't allow that many documents in an index but I think it would be good to have an extra check. * can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just _ maxMergeAtOnce_ ~~ work in progress ~~ I fat-fingered the save button > Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of > very large segments > - > > Key: LUCENE-7976 > URL: https://issues.apache.org/jira/browse/LUCENE-7976 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, > LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch > > > We're seeing situations "in the wild" where there are very large indexes (on > disk) handled quite easily in a single Lucene index. This is particularly > true as features like docValues move data into MMapDirectory space. The > current TMP algorithm allows on the order of 50% deleted documents as per a > dev list conversation with Mike McCandless (and his blog here: > https://www.elastic.co/blog/lucenes-handling-of-deleted-documents). > Especially in the current era of very large indexes in aggregate, (think many > TB) solutions like "you need to distribute your collection over more shards" > become very costly. Additionally, the tempting "optimize" button exacerbates > the issue since once you form, say, a 100G segment (by > optimizing/forceMerging) it is not eligible for merging until 97.5G of the > docs in it are deleted (current default 5G max segment size). > The proposal here would be to add a new parameter to TMP, something like > (no, that's not serious name, suggestions > welcome) which would default to 100 (or the same behavior we have now). > So if I set this parameter to, say, 20%, and the max segment size stays at > 5G, the following would happen when segments were selected for merging: > > any segment with > 20% deleted documents would be merged or rewritten NO > > MATTER HOW LARGE. There are two cases, > >> the segment has < 5G "live" docs. In that case it would be merged with > >> smaller segments to bring the resulting segment up to 5G. If no smaller > >> segments exi
[jira] [Comment Edited] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments
[ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463704#comment-16463704 ] Simon Willnauer edited comment on LUCENE-7976 at 5/4/18 10:54 AM: -- here are a couple comments: * please remove the commented part that refers to // TODO: See LUCENE-8263 * Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_ * can you make _SegmentSizeAndDocs_ static and maybe a simple struct ie. no getters and don't pass IW to it * can we assert that _int totalMaxDocs_ is always positive. I know we don't allow that many documents in an index but I think it would be good to have an extra check. * can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just _ maxMergeAtOnce_ ~~ work in progress ~~ I fat-fingered the save button was (Author: simonw): here are a couple comments: * please remove the commented part that refers to // TODO: See LUCENE-8263 * Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_ * can you make _SegmentSizeAndDocs_ static and maybe a simple struct ie. no getters and don't pass IW to it * can we assert that _int totalMaxDocs_ is always positive. I know we don't allow that many documents in an index but I think it would be good to have an extra check. * can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just _ maxMergeAtOnce_ > Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of > very large segments > - > > Key: LUCENE-7976 > URL: https://issues.apache.org/jira/browse/LUCENE-7976 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, > LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch > > > We're seeing situations "in the wild" where there are very large indexes (on > disk) handled quite easily in a single Lucene index. This is particularly > true as features like docValues move data into MMapDirectory space. The > current TMP algorithm allows on the order of 50% deleted documents as per a > dev list conversation with Mike McCandless (and his blog here: > https://www.elastic.co/blog/lucenes-handling-of-deleted-documents). > Especially in the current era of very large indexes in aggregate, (think many > TB) solutions like "you need to distribute your collection over more shards" > become very costly. Additionally, the tempting "optimize" button exacerbates > the issue since once you form, say, a 100G segment (by > optimizing/forceMerging) it is not eligible for merging until 97.5G of the > docs in it are deleted (current default 5G max segment size). > The proposal here would be to add a new parameter to TMP, something like > (no, that's not serious name, suggestions > welcome) which would default to 100 (or the same behavior we have now). > So if I set this parameter to, say, 20%, and the max segment size stays at > 5G, the following would happen when segments were selected for merging: > > any segment with > 20% deleted documents would be merged or rewritten NO > > MATTER HOW LARGE. There are two cases, > >> the segment has < 5G "live" docs. In that case it would be merged with > >> smaller segments to bring the resulting segment up to 5G. If no smaller > >> segments exist, it would just be rewritten > >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). > >> It would be rewritten into a single segment removing all deleted docs no > >> matter how big it is to start. The 100G example above would be rewritten > >> to an 80G segment for instance. > Of course this would lead to potentially much more I/O which is why the > default would be the same behavior we see now. As it stands now, though, > there's no way to recover from an optimize/forceMerge except to re-index from > scratch. We routinely see 200G-300G Lucene indexes at this point "in the > wild" with 10s of shards replicated 3 or more times. And that doesn't even > include having these over HDFS. > Alternatives welcome! Something like the above seems minimally invasive. A > new merge policy is certainly an alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments
[ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463704#comment-16463704 ] Simon Willnauer commented on LUCENE-7976: - here are a couple comments: * please remove the commented part that refers to // TODO: See LUCENE-8263 * Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_ * can you make _SegmentSizeAndDocs_ static and maybe a simple struct ie. no getters and don't pass IW to it * can we assert that _int totalMaxDocs_ is always positive. I know we don't allow that many documents in an index but I think it would be good to have an extra check. * can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just _ maxMergeAtOnce_ > Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of > very large segments > - > > Key: LUCENE-7976 > URL: https://issues.apache.org/jira/browse/LUCENE-7976 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, > LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch > > > We're seeing situations "in the wild" where there are very large indexes (on > disk) handled quite easily in a single Lucene index. This is particularly > true as features like docValues move data into MMapDirectory space. The > current TMP algorithm allows on the order of 50% deleted documents as per a > dev list conversation with Mike McCandless (and his blog here: > https://www.elastic.co/blog/lucenes-handling-of-deleted-documents). > Especially in the current era of very large indexes in aggregate, (think many > TB) solutions like "you need to distribute your collection over more shards" > become very costly. Additionally, the tempting "optimize" button exacerbates > the issue since once you form, say, a 100G segment (by > optimizing/forceMerging) it is not eligible for merging until 97.5G of the > docs in it are deleted (current default 5G max segment size). > The proposal here would be to add a new parameter to TMP, something like > (no, that's not serious name, suggestions > welcome) which would default to 100 (or the same behavior we have now). > So if I set this parameter to, say, 20%, and the max segment size stays at > 5G, the following would happen when segments were selected for merging: > > any segment with > 20% deleted documents would be merged or rewritten NO > > MATTER HOW LARGE. There are two cases, > >> the segment has < 5G "live" docs. In that case it would be merged with > >> smaller segments to bring the resulting segment up to 5G. If no smaller > >> segments exist, it would just be rewritten > >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). > >> It would be rewritten into a single segment removing all deleted docs no > >> matter how big it is to start. The 100G example above would be rewritten > >> to an 80G segment for instance. > Of course this would lead to potentially much more I/O which is why the > default would be the same behavior we see now. As it stands now, though, > there's no way to recover from an optimize/forceMerge except to re-index from > scratch. We routinely see 200G-300G Lucene indexes at this point "in the > wild" with 10s of shards replicated 3 or more times. And that doesn't even > include having these over HDFS. > Alternatives welcome! Something like the above seems minimally invasive. A > new merge policy is certainly an alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8293) Ensure only hard deletes are carried over in a merge
[ https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8293. - Resolution: Fixed > Ensure only hard deletes are carried over in a merge > > > Key: LUCENE-8293 > URL: https://issues.apache.org/jira/browse/LUCENE-8293 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8293.patch, LUCENE-8293.patch > > > Today we carry over hard deletes based on the SegmentReaders liveDocs. > This is not correct if soft-deletes are used especially with rentention > policies. If a soft delete is added while a segment is merged the document > might end up hard deleted in the target segment. This isn't necessarily a > correctness issue but causes unnecessary writes of hard-deletes. The > biggest > issue here is that we assert that previously deleted documents are still > deleted > in the live-docs we apply and that might be violated by the retention > policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8295) Remove ReadersAndUpdates.liveDocsSharedPending
[ https://issues.apache.org/jira/browse/LUCENE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463532#comment-16463532 ] Simon Willnauer commented on LUCENE-8295: - {noformat} Not that I fully understand it, but looking at the patch alone wouldn't it miss calling pendingDeletes.liveDocsShared() (and this in turn does have further consequences in that other class)? Ping Simon, he'll know. {noformat} I looked at the history and I agree with [~jpountz] that this is unnecessary to do outside of the places where we call it explicitly ie in getReaderForMerge and getReadOnlyClone. patch LGTM > Remove ReadersAndUpdates.liveDocsSharedPending > -- > > Key: LUCENE-8295 > URL: https://issues.apache.org/jira/browse/LUCENE-8295 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8295.patch > > > I have been trying to undersdand PendingDeletes and ReadersAndUpdates, and it > looks to me that the liveDocsSharedPending flag doesn't buy anything. I ran > tests 10 times after removing it and got no failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8293) Ensure only hard deletes are carried over in a merge
[ https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8293: Attachment: LUCENE-8293.patch > Ensure only hard deletes are carried over in a merge > > > Key: LUCENE-8293 > URL: https://issues.apache.org/jira/browse/LUCENE-8293 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8293.patch, LUCENE-8293.patch > > > Today we carry over hard deletes based on the SegmentReaders liveDocs. > This is not correct if soft-deletes are used especially with rentention > policies. If a soft delete is added while a segment is merged the document > might end up hard deleted in the target segment. This isn't necessarily a > correctness issue but causes unnecessary writes of hard-deletes. The > biggest > issue here is that we assert that previously deleted documents are still > deleted > in the live-docs we apply and that might be violated by the retention > policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8293) Ensure only hard deletes are carried over in a merge
[ https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462609#comment-16462609 ] Simon Willnauer commented on LUCENE-8293: - [~mikemccand] I added another test and fixed some corner cases with soft-deletes. Can you take another look? > Ensure only hard deletes are carried over in a merge > > > Key: LUCENE-8293 > URL: https://issues.apache.org/jira/browse/LUCENE-8293 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8293.patch, LUCENE-8293.patch > > > Today we carry over hard deletes based on the SegmentReaders liveDocs. > This is not correct if soft-deletes are used especially with rentention > policies. If a soft delete is added while a segment is merged the document > might end up hard deleted in the target segment. This isn't necessarily a > correctness issue but causes unnecessary writes of hard-deletes. The > biggest > issue here is that we assert that previously deleted documents are still > deleted > in the live-docs we apply and that might be violated by the retention > policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8293) Ensure only hard deletes are carried over in a merge
[ https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462527#comment-16462527 ] Simon Willnauer commented on LUCENE-8293: - [~erickerickson] no it doesn't > Ensure only hard deletes are carried over in a merge > > > Key: LUCENE-8293 > URL: https://issues.apache.org/jira/browse/LUCENE-8293 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8293.patch > > > Today we carry over hard deletes based on the SegmentReaders liveDocs. > This is not correct if soft-deletes are used especially with rentention > policies. If a soft delete is added while a segment is merged the document > might end up hard deleted in the target segment. This isn't necessarily a > correctness issue but causes unnecessary writes of hard-deletes. The > biggest > issue here is that we assert that previously deleted documents are still > deleted > in the live-docs we apply and that might be violated by the retention > policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8290) Keep soft deletes in sync with on-disk DocValues
[ https://issues.apache.org/jira/browse/LUCENE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8290. - Resolution: Fixed > Keep soft deletes in sync with on-disk DocValues > > > Key: LUCENE-8290 > URL: https://issues.apache.org/jira/browse/LUCENE-8290 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8290.patch > > > Today we pass on the doc values update to the PendingDeletes > when it's applied. This might cause issues with a rentention policy > merge policy that will see a deleted document but not it's value on > disk. > This change moves back the PendingDeletes callback to flush time > in order to be consistent with what is actually updated on disk. > > This change also makes sure we write values to disk on flush that > are in the reader pool as well as extra best effort checks to drop > fully deleted segments on flush, commit and getReader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8293) Ensure only hard deletes are carried over in a merge
[ https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8293: Attachment: LUCENE-8293.patch > Ensure only hard deletes are carried over in a merge > > > Key: LUCENE-8293 > URL: https://issues.apache.org/jira/browse/LUCENE-8293 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8293.patch > > > Today we carry over hard deletes based on the SegmentReaders liveDocs. > This is not correct if soft-deletes are used especially with rentention > policies. If a soft delete is added while a segment is merged the document > might end up hard deleted in the target segment. This isn't necessarily a > correctness issue but causes unnecessary writes of hard-deletes. The > biggest > issue here is that we assert that previously deleted documents are still > deleted > in the live-docs we apply and that might be violated by the retention > policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8293) Ensure only hard deletes are carried over in a merge
Simon Willnauer created LUCENE-8293: --- Summary: Ensure only hard deletes are carried over in a merge Key: LUCENE-8293 URL: https://issues.apache.org/jira/browse/LUCENE-8293 Project: Lucene - Core Issue Type: Bug Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) Attachments: LUCENE-8293.patch Today we carry over hard deletes based on the SegmentReaders liveDocs. This is not correct if soft-deletes are used especially with rentention policies. If a soft delete is added while a segment is merged the document might end up hard deleted in the target segment. This isn't necessarily a correctness issue but causes unnecessary writes of hard-deletes. The biggest issue here is that we assert that previously deleted documents are still deleted in the live-docs we apply and that might be violated by the retention policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8289) Share logic between Numeric and Binary DocValuesFieldUpdates
[ https://issues.apache.org/jira/browse/LUCENE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8289. - Resolution: Fixed > Share logic between Numeric and Binary DocValuesFieldUpdates > > > Key: LUCENE-8289 > URL: https://issues.apache.org/jira/browse/LUCENE-8289 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8289.patch > > > NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate > a significant amount of logic that can all be pushed into the base class. > This change moves all the logic that is independent of the type to the > base > class. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8290) Keep soft deletes in sync with on-disk DocValues
[ https://issues.apache.org/jira/browse/LUCENE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461106#comment-16461106 ] Simon Willnauer commented on LUCENE-8290: - [~mikemccand] can you take a look > Keep soft deletes in sync with on-disk DocValues > > > Key: LUCENE-8290 > URL: https://issues.apache.org/jira/browse/LUCENE-8290 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8290.patch > > > Today we pass on the doc values update to the PendingDeletes > when it's applied. This might cause issues with a rentention policy > merge policy that will see a deleted document but not it's value on > disk. > This change moves back the PendingDeletes callback to flush time > in order to be consistent with what is actually updated on disk. > > This change also makes sure we write values to disk on flush that > are in the reader pool as well as extra best effort checks to drop > fully deleted segments on flush, commit and getReader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8290) Keep soft deletes in sync with on-disk DocValues
[ https://issues.apache.org/jira/browse/LUCENE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8290: Attachment: LUCENE-8290.patch > Keep soft deletes in sync with on-disk DocValues > > > Key: LUCENE-8290 > URL: https://issues.apache.org/jira/browse/LUCENE-8290 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8290.patch > > > Today we pass on the doc values update to the PendingDeletes > when it's applied. This might cause issues with a rentention policy > merge policy that will see a deleted document but not it's value on > disk. > This change moves back the PendingDeletes callback to flush time > in order to be consistent with what is actually updated on disk. > > This change also makes sure we write values to disk on flush that > are in the reader pool as well as extra best effort checks to drop > fully deleted segments on flush, commit and getReader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8290) Keep soft deletes in sync with on-disk DocValues
Simon Willnauer created LUCENE-8290: --- Summary: Keep soft deletes in sync with on-disk DocValues Key: LUCENE-8290 URL: https://issues.apache.org/jira/browse/LUCENE-8290 Project: Lucene - Core Issue Type: Bug Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) Today we pass on the doc values update to the PendingDeletes when it's applied. This might cause issues with a rentention policy merge policy that will see a deleted document but not it's value on disk. This change moves back the PendingDeletes callback to flush time in order to be consistent with what is actually updated on disk. This change also makes sure we write values to disk on flush that are in the reader pool as well as extra best effort checks to drop fully deleted segments on flush, commit and getReader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8289) Share logic between Numeric and Binary DocValuesFieldUpdates
Simon Willnauer created LUCENE-8289: --- Summary: Share logic between Numeric and Binary DocValuesFieldUpdates Key: LUCENE-8289 URL: https://issues.apache.org/jira/browse/LUCENE-8289 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) Attachments: LUCENE-8289.patch NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate a significant amount of logic that can all be pushed into the base class. This change moves all the logic that is independent of the type to the base class. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8289) Share logic between Numeric and Binary DocValuesFieldUpdates
[ https://issues.apache.org/jira/browse/LUCENE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8289: Attachment: LUCENE-8289.patch > Share logic between Numeric and Binary DocValuesFieldUpdates > > > Key: LUCENE-8289 > URL: https://issues.apache.org/jira/browse/LUCENE-8289 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8289.patch > > > NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate > a significant amount of logic that can all be pushed into the base class. > This change moves all the logic that is independent of the type to the > base > class. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-master-Windows (64bit/jdk1.8.0_144) - Build # 7296 - Still Unstable!
pushed a fix to master and 7x On Mon, Apr 30, 2018 at 11:46 AM, Simon Willnauer <simon.willna...@gmail.com> wrote: > I am looking into the TestDirectoryTaxonomyWriter#testRecreateAndRefresh > failure > > On Mon, Apr 30, 2018 at 9:59 AM, Policeman Jenkins Server > <jenk...@thetaphi.de> wrote: >> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7296/ >> Java: 64bit/jdk1.8.0_144 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC >> >> 29 tests failed. >> FAILED: >> org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testRecreateAndRefresh >> >> Error Message: >> Directory >> MockDirectoryWrapper(NIOFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\build\facet\test\J1\temp\lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter_F392C2FFDA61922B-001\index-NIOFSDirectory-001 >> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5e6a7788) still has >> pending deleted files; cannot initialize IndexWriter >> >> Stack Trace: >> java.lang.IllegalArgumentException: Directory >> MockDirectoryWrapper(NIOFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\build\facet\test\J1\temp\lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter_F392C2FFDA61922B-001\index-NIOFSDirectory-001 >> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5e6a7788) still has >> pending deleted files; cannot initialize IndexWriter >> at >> __randomizedtesting.SeedInfo.seed([F392C2FFDA61922B:BD08DA5FF63FC513]:0) >> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:707) >> at >> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.openIndexWriter(DirectoryTaxonomyWriter.java:240) >> at >> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.(DirectoryTaxonomyWriter.java:167) >> at >> org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testRecreateAndRefresh(TestDirectoryTaxonomyWriter.java:214) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984) >> at >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >> at >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) >> at >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> org.apache.lucene.util.TestRuleStor
[jira] [Resolved] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates
[ https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8282. - Resolution: Fixed > Reduce boxing and unnecessary object creation in DV updates > --- > > Key: LUCENE-8282 > URL: https://issues.apache.org/jira/browse/LUCENE-8282 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8282.patch > > > DV updates used the boxed type Long to keep API generic. Yet, the missing > type caused a lot of code duplication, boxing and unnecessary object creation. > This change cuts over to type safe APIs using BytesRef and long (the > primitive) > In this change most of the code that is almost identical between binary and > numeric > is not shared reducing the maintenance overhead and likelihood of introducing > bugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-master-Windows (64bit/jdk1.8.0_144) - Build # 7296 - Still Unstable!
I am looking into the TestDirectoryTaxonomyWriter#testRecreateAndRefresh failure On Mon, Apr 30, 2018 at 9:59 AM, Policeman Jenkins Serverwrote: > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7296/ > Java: 64bit/jdk1.8.0_144 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC > > 29 tests failed. > FAILED: > org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testRecreateAndRefresh > > Error Message: > Directory > MockDirectoryWrapper(NIOFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\build\facet\test\J1\temp\lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter_F392C2FFDA61922B-001\index-NIOFSDirectory-001 > lockFactory=org.apache.lucene.store.NativeFSLockFactory@5e6a7788) still has > pending deleted files; cannot initialize IndexWriter > > Stack Trace: > java.lang.IllegalArgumentException: Directory > MockDirectoryWrapper(NIOFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\build\facet\test\J1\temp\lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter_F392C2FFDA61922B-001\index-NIOFSDirectory-001 > lockFactory=org.apache.lucene.store.NativeFSLockFactory@5e6a7788) still has > pending deleted files; cannot initialize IndexWriter > at > __randomizedtesting.SeedInfo.seed([F392C2FFDA61922B:BD08DA5FF63FC513]:0) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:707) > at > org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.openIndexWriter(DirectoryTaxonomyWriter.java:240) > at > org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.(DirectoryTaxonomyWriter.java:167) > at > org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testRecreateAndRefresh(TestDirectoryTaxonomyWriter.java:214) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at >
[jira] [Commented] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates
[ https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456093#comment-16456093 ] Simon Willnauer commented on LUCENE-8282: - https://github.com/s1monw/lucene-solr/pull/16 /cc [~mikemccand] [~shaie] [~dweiss] > Reduce boxing and unnecessary object creation in DV updates > --- > > Key: LUCENE-8282 > URL: https://issues.apache.org/jira/browse/LUCENE-8282 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8282.patch > > > DV updates used the boxed type Long to keep API generic. Yet, the missing > type caused a lot of code duplication, boxing and unnecessary object creation. > This change cuts over to type safe APIs using BytesRef and long (the > primitive) > In this change most of the code that is almost identical between binary and > numeric > is not shared reducing the maintenance overhead and likelihood of introducing > bugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates
[ https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8282: Fix Version/s: master (8.0) 7.4 > Reduce boxing and unnecessary object creation in DV updates > --- > > Key: LUCENE-8282 > URL: https://issues.apache.org/jira/browse/LUCENE-8282 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8282.patch > > > DV updates used the boxed type Long to keep API generic. Yet, the missing > type caused a lot of code duplication, boxing and unnecessary object creation. > This change cuts over to type safe APIs using BytesRef and long (the > primitive) > In this change most of the code that is almost identical between binary and > numeric > is not shared reducing the maintenance overhead and likelihood of introducing > bugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates
[ https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8282: Affects Version/s: master (8.0) 7.4 > Reduce boxing and unnecessary object creation in DV updates > --- > > Key: LUCENE-8282 > URL: https://issues.apache.org/jira/browse/LUCENE-8282 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8282.patch > > > DV updates used the boxed type Long to keep API generic. Yet, the missing > type caused a lot of code duplication, boxing and unnecessary object creation. > This change cuts over to type safe APIs using BytesRef and long (the > primitive) > In this change most of the code that is almost identical between binary and > numeric > is not shared reducing the maintenance overhead and likelihood of introducing > bugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates
[ https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8282: Attachment: LUCENE-8282.patch > Reduce boxing and unnecessary object creation in DV updates > --- > > Key: LUCENE-8282 > URL: https://issues.apache.org/jira/browse/LUCENE-8282 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Simon Willnauer >Priority: Major > Attachments: LUCENE-8282.patch > > > DV updates used the boxed type Long to keep API generic. Yet, the missing > type caused a lot of code duplication, boxing and unnecessary object creation. > This change cuts over to type safe APIs using BytesRef and long (the > primitive) > In this change most of the code that is almost identical between binary and > numeric > is not shared reducing the maintenance overhead and likelihood of introducing > bugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates
Simon Willnauer created LUCENE-8282: --- Summary: Reduce boxing and unnecessary object creation in DV updates Key: LUCENE-8282 URL: https://issues.apache.org/jira/browse/LUCENE-8282 Project: Lucene - Core Issue Type: Improvement Reporter: Simon Willnauer DV updates used the boxed type Long to keep API generic. Yet, the missing type caused a lot of code duplication, boxing and unnecessary object creation. This change cuts over to type safe APIs using BytesRef and long (the primitive) In this change most of the code that is almost identical between binary and numeric is not shared reducing the maintenance overhead and likelihood of introducing bugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Friendly reminder: please run precommit
precommit: BUILD SUCCESSFUL Total time: 10 minutes 48 seconds on a my like 2 year old macbook pro I think that is reasonable? On Thu, Apr 26, 2018 at 3:23 PM, Karl Wright <daddy...@gmail.com> wrote: > :-) > > 25 minutes is an eternity these days, Robert. This is especially true when > others are collaborating with what you are doing, as was the case here. The > other approach would be to create a branch, but I've been avoiding that on > git. > > "ant documentation-lint" is what I'm looking for, thanks. > > Karl > > > On Thu, Apr 26, 2018 at 8:21 AM, Robert Muir <rcm...@gmail.com> wrote: >> >> I don't understand the turnaround issue, why do the commits need to be >> rushed in? >> There is patch validation recently hooked in to avoid keeping your >> computer busy for 25 minutes. >> If you are not changing third party dependencies or anything "heavy" >> like that you should at least run "ant documentation-lint" from >> lucene/ >> >> >> On Thu, Apr 26, 2018 at 8:02 AM, Karl Wright <daddy...@gmail.com> wrote: >> > How long does precommit take you to run? For me, it's a good 25 >> > minutes. >> > That really impacts turnaround, which is why I'd love a precommit that >> > looked only at certain things in the local package I'm dealing with. >> > >> > Karl >> > >> > On Thu, Apr 26, 2018 at 6:14 AM, Simon Willnauer >> > <simon.willna...@gmail.com> >> > wrote: >> >> >> >> Hey folks, >> >> >> >> I had to fix several glitches lately that are caught by running >> >> precommit. It's a simple step please take the time running `ant clean >> >> precommit` on top-level. >> >> >> >> Thanks, >> >> >> >> Simon >> >> >> >> - >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> > >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk1.8.0_162) - Build # 21910 - Failure!
this is fixed On Thu, Apr 26, 2018 at 12:33 PM, Policeman Jenkins Serverwrote: > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/21910/ > Java: 64bit/jdk1.8.0_162 -XX:+UseCompressedOops -XX:+UseSerialGC > > All tests passed > > Build Log: > [...truncated 53959 lines...] > -ecj-javadoc-lint-src: > [mkdir] Created dir: /tmp/ecj1818329761 > [ecj-lint] Compiling 103 source files to /tmp/ecj1818329761 > [ecj-lint] -- > [ecj-lint] 1. ERROR in > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/GeoComplexPolygon.java > (at line 22) > [ecj-lint] import java.util.Set; > [ecj-lint]^ > [ecj-lint] The import java.util.Set is never used > [ecj-lint] -- > [ecj-lint] 2. ERROR in > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/GeoComplexPolygon.java > (at line 23) > [ecj-lint] import java.util.HashSet; > [ecj-lint]^ > [ecj-lint] The import java.util.HashSet is never used > [ecj-lint] -- > [ecj-lint] 2 problems (2 errors) > > BUILD FAILED > /home/jenkins/workspace/Lucene-Solr-master-Linux/build.xml:633: The following > error occurred while executing this line: > /home/jenkins/workspace/Lucene-Solr-master-Linux/build.xml:101: The following > error occurred while executing this line: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build.xml:208: The > following error occurred while executing this line: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/common-build.xml:2264: > The following error occurred while executing this line: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/common-build.xml:2089: > The following error occurred while executing this line: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/common-build.xml:2128: > Compile failed; see the compiler error output for details. > > Total time: 76 minutes 50 seconds > Build step 'Invoke Ant' marked build as failure > Archiving artifacts > Setting > ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2 > [WARNINGS] Skipping publisher since build result is FAILURE > Recording test results > Setting > ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2 > Email was triggered for: Failure - Any > Sending email for trigger: Failure - Any > Setting > ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2 > Setting > ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2 > Setting > ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2 > Setting > ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2 > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Friendly reminder: please run precommit
Hey folks, I had to fix several glitches lately that are caught by running precommit. It's a simple step please take the time running `ant clean precommit` on top-level. Thanks, Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-7.x-Windows (64bit/jdk-9.0.4) - Build # 565 - Still Unstable!
pushed a fix for this On Wed, Apr 25, 2018 at 11:24 PM, Policeman Jenkins Serverwrote: > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Windows/565/ > Java: 64bit/jdk-9.0.4 -XX:-UseCompressedOops -XX:+UseParallelGC > > 42 tests failed. > FAILED: org.apache.lucene.store.TestNativeFSLockFactory.testStressLocks > > Error Message: > IndexWriter hit unexpected exceptions > > Stack Trace: > java.lang.AssertionError: IndexWriter hit unexpected exceptions > at > __randomizedtesting.SeedInfo.seed([72F563BD958B028E:2CC42D408927CAE8]:0) > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.lucene.store.BaseLockFactoryTestCase.testStressLocks(BaseLockFactoryTestCase.java:180) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at java.base/java.lang.Thread.run(Thread.java:844) > > > FAILED: org.apache.lucene.store.TestNativeFSLockFactory.testStressLocks > > Error Message: > IndexWriter hit unexpected exceptions > > Stack Trace: > java.lang.AssertionError: IndexWriter hit unexpected exceptions > at > __randomizedtesting.SeedInfo.seed([72F563BD958B028E:2CC42D408927CAE8]:0) > at org.junit.Assert.fail(Assert.java:93) > at
Re: [JENKINS] Lucene-Solr-7.x-Windows (64bit/jdk1.8.0_144) - Build # 566 - Still Unstable!
pushed a fix for this On Thu, Apr 26, 2018 at 10:48 AM, Policeman Jenkins Serverwrote: > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Windows/566/ > Java: 64bit/jdk1.8.0_144 -XX:-UseCompressedOops -XX:+UseG1GC > > 28 tests failed. > FAILED: org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test > > Error Message: > Directory > MockDirectoryWrapper(SimpleFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-7.x-Windows\lucene\build\core\test\J1\temp\lucene.index.TestIndexWriterOutOfFileDescriptors_ABAD3B75FD1956FA-002\TestIndexWriterOutOfFileDescriptors-001 > lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d28f2c0) still has > pending deleted files; cannot initialize IndexWriter > > Stack Trace: > java.lang.IllegalArgumentException: Directory > MockDirectoryWrapper(SimpleFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-7.x-Windows\lucene\build\core\test\J1\temp\lucene.index.TestIndexWriterOutOfFileDescriptors_ABAD3B75FD1956FA-002\TestIndexWriterOutOfFileDescriptors-001 > lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d28f2c0) still has > pending deleted files; cannot initialize IndexWriter > at > __randomizedtesting.SeedInfo.seed([ABAD3B75FD1956FA:23F904AF53E53B02]:0) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:707) > at > org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test(TestIndexWriterOutOfFileDescriptors.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at >
Re: [JENKINS] Lucene-Solr-Tests-7.x - Build # 584 - Failure
I pushed a fix for this as well On Thu, Apr 26, 2018 at 11:55 AM, Apache Jenkins Serverwrote: > Build: https://builds.apache.org/job/Lucene-Solr-Tests-7.x/584/ > > All tests passed > > Build Log: > [...truncated 54019 lines...] > -ecj-javadoc-lint-src: > [mkdir] Created dir: /tmp/ecj270125761 > [ecj-lint] Compiling 103 source files to /tmp/ecj270125761 > [ecj-lint] -- > [ecj-lint] 1. ERROR in > /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/GeoComplexPolygon.java > (at line 22) > [ecj-lint] import java.util.Set; > [ecj-lint]^ > [ecj-lint] The import java.util.Set is never used > [ecj-lint] -- > [ecj-lint] 2. ERROR in > /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/GeoComplexPolygon.java > (at line 23) > [ecj-lint] import java.util.HashSet; > [ecj-lint]^ > [ecj-lint] The import java.util.HashSet is never used > [ecj-lint] -- > [ecj-lint] 2 problems (2 errors) > > BUILD FAILED > /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/build.xml:633: The > following error occurred while executing this line: > /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/build.xml:101: The > following error occurred while executing this line: > /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/build.xml:208: > The following error occurred while executing this line: > /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/common-build.xml:2264: > The following error occurred while executing this line: > /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/common-build.xml:2089: > The following error occurred while executing this line: > /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/common-build.xml:2128: > Compile failed; see the compiler error output for details. > > Total time: 85 minutes 5 seconds > Build step 'Invoke Ant' marked build as failure > Archiving artifacts > Recording test results > Email was triggered for: Failure - Any > Sending email for trigger: Failure - Any > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-master-Windows (64bit/jdk-9.0.4) - Build # 7289 - Still Unstable!
I will push a fix for this soon! On Thu, Apr 26, 2018 at 8:48 AM, Policeman Jenkins Serverwrote: > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7289/ > Java: 64bit/jdk-9.0.4 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC > > 57 tests failed. > FAILED: org.apache.lucene.store.TestSleepingLockWrapper.testStressLocks > > Error Message: > IndexWriter hit unexpected exceptions > > Stack Trace: > java.lang.AssertionError: IndexWriter hit unexpected exceptions > at > __randomizedtesting.SeedInfo.seed([A16D532BE1A04815:FF5C1DD6FD0C8073]:0) > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.lucene.store.BaseLockFactoryTestCase.testStressLocks(BaseLockFactoryTestCase.java:180) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at java.base/java.lang.Thread.run(Thread.java:844) > > > FAILED: org.apache.lucene.store.TestSleepingLockWrapper.testStressLocks > > Error Message: > IndexWriter hit unexpected exceptions > > Stack Trace: > java.lang.AssertionError: IndexWriter hit unexpected exceptions > at > __randomizedtesting.SeedInfo.seed([A16D532BE1A04815:FF5C1DD6FD0C8073]:0) > at org.junit.Assert.fail(Assert.java:93) >
[jira] [Commented] (LUCENE-8277) Better validate CodecReaders in addIndexes
[ https://issues.apache.org/jira/browse/LUCENE-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452995#comment-16452995 ] Simon Willnauer commented on LUCENE-8277: - +1 > Better validate CodecReaders in addIndexes > -- > > Key: LUCENE-8277 > URL: https://issues.apache.org/jira/browse/LUCENE-8277 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > The discussion at LUCENE-8264 made me wonder that we should apply the same > checks to addIndexes(CodecReader) that we apply at index time if the input > reader is not a SegmentReader such as: > - positions are less than the maximum position > - offsets are going forward > And maybe also check that the API is implemented correctly, eg. terms, doc > ids and positions are returned in order? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8275) Push up #checkPendingDeletes to Directory
[ https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452974#comment-16452974 ] Simon Willnauer commented on LUCENE-8275: - {quote} Curious, what distinction is there between Directory.checkPendingDeletes returning true vs it throwing an IOException? Maybe it should be just one or the other – i.e. return boolean but never throw an exception, or return void but possibly throw an IOException? {quote} The return type of a method doesn't have anything to do with the exception it can throw. We do, as a side-effect of this method retry deleteing pending deletes, this can throw an IOException. Unless there are some underlying FS issues it just signals if there are any pending deletions. I think the signature makes sense as it is? > Push up #checkPendingDeletes to Directory > - > > Key: LUCENE-8275 > URL: https://issues.apache.org/jira/browse/LUCENE-8275 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8275.patch, LUCENE-8275.patch > > > IndexWriter checks in it's ctor if the incoming directory is an > FSDirectory. If that is the case it ensures that the directory retries > deleting it's pending deletes and if there are pending deletes it will > fail creating the writer. Yet, this check didn't unwrap filter directories > or subclasses like FileSwitchDirectory such that in the case of MDW we > never checked for pending deletes. > > There are also two places in FSDirectory that first removed the file > that was supposed to be created / renamed to from the pending deletes set > and then tried to clean up pending deletes which excluded the file. These > places now remove the file from the set after the pending deletes are > checked. > > This caused some test failures lately unfortunately very timing dependent: > > {noformat} > FAILED: > junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager > Error Message: > Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Stack Trace: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Caused by: java.lang.RuntimeException: > java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) > at > org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) > Caused by: java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) > at > java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at java.base/java.nio.file.Files.newOutputStream(Files.java:218) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:2
[jira] [Resolved] (LUCENE-8275) Push up #checkPendingDeletes to Directory
[ https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8275. - Resolution: Fixed > Push up #checkPendingDeletes to Directory > - > > Key: LUCENE-8275 > URL: https://issues.apache.org/jira/browse/LUCENE-8275 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8275.patch, LUCENE-8275.patch > > > IndexWriter checks in it's ctor if the incoming directory is an > FSDirectory. If that is the case it ensures that the directory retries > deleting it's pending deletes and if there are pending deletes it will > fail creating the writer. Yet, this check didn't unwrap filter directories > or subclasses like FileSwitchDirectory such that in the case of MDW we > never checked for pending deletes. > > There are also two places in FSDirectory that first removed the file > that was supposed to be created / renamed to from the pending deletes set > and then tried to clean up pending deletes which excluded the file. These > places now remove the file from the set after the pending deletes are > checked. > > This caused some test failures lately unfortunately very timing dependent: > > {noformat} > FAILED: > junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager > Error Message: > Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Stack Trace: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Caused by: java.lang.RuntimeException: > java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) > at > org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) > Caused by: java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) > at > java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at java.base/java.nio.file.Files.newOutputStream(Files.java:218) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.jav
[jira] [Updated] (LUCENE-8275) Push up #checkPendingDeletes to Directory
[ https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8275: Summary: Push up #checkPendingDeletes to Directory (was: Unwrap directory to check for FSDirectory) > Push up #checkPendingDeletes to Directory > - > > Key: LUCENE-8275 > URL: https://issues.apache.org/jira/browse/LUCENE-8275 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8275.patch, LUCENE-8275.patch > > > IndexWriter checks in it's ctor if the incoming directory is an > FSDirectory. If that is the case it ensures that the directory retries > deleting it's pending deletes and if there are pending deletes it will > fail creating the writer. Yet, this check didn't unwrap filter directories > such that in the case of MDW we never checked for pending deletes. > There are also two places in FSDirectory that first removed the file > that was supposed to be created / renamed to from the pending deletes set > and then tried to clean up pending deletes which excluded the file. These > places now remove the file from the set after the pending deletes are > checked. > > This caused some test failures lately unfortunately very timing dependent: > > {noformat} > FAILED: > junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager > Error Message: > Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Stack Trace: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Caused by: java.lang.RuntimeException: > java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) > at > org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) > Caused by: java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) > at > java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at java.base/java.nio.file.Files.newOutputStream(Files.java:218) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
[jira] [Updated] (LUCENE-8275) Push up #checkPendingDeletes to Directory
[ https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8275: Description: IndexWriter checks in it's ctor if the incoming directory is an FSDirectory. If that is the case it ensures that the directory retries deleting it's pending deletes and if there are pending deletes it will fail creating the writer. Yet, this check didn't unwrap filter directories or subclasses like FileSwitchDirectory such that in the case of MDW we never checked for pending deletes. There are also two places in FSDirectory that first removed the file that was supposed to be created / renamed to from the pending deletes set and then tried to clean up pending deletes which excluded the file. These places now remove the file from the set after the pending deletes are checked. This caused some test failures lately unfortunately very timing dependent: {noformat} FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager Error Message: Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, state=RUNNABLE, group=TGRP-TestSearcherManager] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, state=RUNNABLE, group=TGRP-TestSearcherManager] Caused by: java.lang.RuntimeException: java.nio.file.FileAlreadyExistsException: /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) at org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) Caused by: java.nio.file.FileAlreadyExistsException: /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) at org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) at org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) at org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) at org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) at org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) at org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) at java.base/java.nio.file.Files.newOutputStream(Files.java:218) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) at org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) at org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:48) at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:39) at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:46) at org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:363) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:399) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:490
[jira] [Updated] (LUCENE-8275) Unwrap directory to check for FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8275: Attachment: LUCENE-8275.patch > Unwrap directory to check for FSDirectory > - > > Key: LUCENE-8275 > URL: https://issues.apache.org/jira/browse/LUCENE-8275 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8275.patch, LUCENE-8275.patch > > > IndexWriter checks in it's ctor if the incoming directory is an > FSDirectory. If that is the case it ensures that the directory retries > deleting it's pending deletes and if there are pending deletes it will > fail creating the writer. Yet, this check didn't unwrap filter directories > such that in the case of MDW we never checked for pending deletes. > There are also two places in FSDirectory that first removed the file > that was supposed to be created / renamed to from the pending deletes set > and then tried to clean up pending deletes which excluded the file. These > places now remove the file from the set after the pending deletes are > checked. > > This caused some test failures lately unfortunately very timing dependent: > > {noformat} > FAILED: > junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager > Error Message: > Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Stack Trace: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Caused by: java.lang.RuntimeException: > java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) > at > org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) > Caused by: java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) > at > java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at java.base/java.nio.file.Files.newOutputStream(Files.java:218) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.f
[jira] [Resolved] (LUCENE-8272) Share internal DV update code between binary and numeric
[ https://issues.apache.org/jira/browse/LUCENE-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8272. - Resolution: Fixed thanks everybody > Share internal DV update code between binary and numeric > > > Key: LUCENE-8272 > URL: https://issues.apache.org/jira/browse/LUCENE-8272 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8272.patch > > > Today we duplicate a fair portion of the internal logic to > apply updates of binary and numeric doc values. This change refactors > this non-trivial code to share the same code path and only differ in > if we provide a binary or numeric instance. This also allows us to > iterator over the updates only once rather than twice once for numeric > and once for binary fields. > > This change also subclass DocValuesIterator from > DocValuesFieldUpdates.Iterator > which allows easier consumption down the road since it now shares most of > it's > interface with DocIdSetIterator which is the main interface for this in > Lucene. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8275) Unwrap directory to check for FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452249#comment-16452249 ] Simon Willnauer commented on LUCENE-8275: - [~rcmuir] good point about FSD. I attached a new patch with a minimal solution. > Unwrap directory to check for FSDirectory > - > > Key: LUCENE-8275 > URL: https://issues.apache.org/jira/browse/LUCENE-8275 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8275.patch > > > IndexWriter checks in it's ctor if the incoming directory is an > FSDirectory. If that is the case it ensures that the directory retries > deleting it's pending deletes and if there are pending deletes it will > fail creating the writer. Yet, this check didn't unwrap filter directories > such that in the case of MDW we never checked for pending deletes. > There are also two places in FSDirectory that first removed the file > that was supposed to be created / renamed to from the pending deletes set > and then tried to clean up pending deletes which excluded the file. These > places now remove the file from the set after the pending deletes are > checked. > > This caused some test failures lately unfortunately very timing dependent: > > {noformat} > FAILED: > junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager > Error Message: > Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Stack Trace: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Caused by: java.lang.RuntimeException: > java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) > at > org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) > Caused by: java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) > at > java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at java.base/java.nio.file.Files.newOutputStream(Files.java:218) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments
[ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451952#comment-16451952 ] Simon Willnauer commented on LUCENE-8264: - I totally agree with robert here, good collection of valid technical points. We can't let lurking corruptions happen. The improvements made to norms here are awesome and we need to move forward with stuff like this. Also after looking at the details, I am convinced the guarantees that this restriction gives us are crucial to the future of lucene. We can't support lurking corruptions for users that come from ancient versions by converting (merging / rewriting segments) from N-X to N in steps that nobody ever tested. Also the points about the database aspect are very much valid. We need raw data to re-create these indices reliably and if you are running on top of a search engine you need to account for reindexing. Btw. we have this restriction in ES since 1.0 implicitly. We always only supported N-1 major versions for ES indices, yet they happen to be corresponding to N-1 Lucene major versions. There is also a lot of work gone into supporting searching across major versions of ES to allow users to stay on older versions for retention policy purposes. Some of these conversations are not easy but necessary for us to prevent support insanity. That said, I think there might be room for N-X at some point as long as the guarantee is only N-1. At some point we might allow the min index created version to be 7 even if you are on 9. But for us to make progress we need to be free to break and only guarantee N-1. Also, what this means is that your indices are supported ~2.5 years that is the major release cadence historically. I think it's important to keep this in mind. > Allow an option to rewrite all segments > --- > > Key: LUCENE-8264 > URL: https://issues.apache.org/jira/browse/LUCENE-8264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > For the background, see SOLR-12259. > There are several use-cases that would be much easier, especially during > upgrades, if we could specify that all segments get rewritten. > One example: Upgrading 5x->6x->7x. When segments are merged, they're > rewritten into the current format. However, there's no guarantee that a > particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily > be successful. > How many merge policies support this is an open question. I propose to start > with TMP and raise other JIRAs as necessary for other merge policies. > So far the usual response has been "re-index from scratch", but that's > increasingly difficult as systems get larger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-10) - Build # 21897 - Unstable!
I was able to reproduce this failure by extending the time this test runs. (it's depends on clock time which is terrible enough). The issue doesn't seem to be related to the changes made lately, the only relation I can see is that due to the changes some timing changed and I suspect things got a bit quicker cutting over to a new IW. The issue (afaik) is that there is still a reference to a file open after the IW got rolled back and WindowsFS can't delete the file causing FSDirectory to put it into pendingDeletes. Now we try to open a new IW and it tries to write this file again and the test fails since we potentially never check again for pending files. There is also this N^2 protection in FSDirectory that doesn't help here necessarily. I opened [1] to fix IW and try to delete pending files when they are created new. I still think this test can potentially run into this situation sooner or later. [1] https://issues.apache.org/jira/browse/LUCENE-8275 On Tue, Apr 24, 2018 at 6:06 PM, Simon Willnauer <simon.willna...@gmail.com> wrote: > I am looking into this > > On Tue, Apr 24, 2018 at 5:37 PM, Policeman Jenkins Server > <jenk...@thetaphi.de> wrote: >> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/21897/ >> Java: 64bit/jdk-10 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC >> >> 6 tests failed. >> FAILED: >> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager >> >> Error Message: >> Suite timeout exceeded (>= 720 msec). >> >> Stack Trace: >> java.lang.Exception: Suite timeout exceeded (>= 720 msec). >> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) >> >> >> FAILED: >> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager >> >> Error Message: >> Captured an uncaught exception in thread: Thread[id=17, name=Thread-1, >> state=RUNNABLE, group=TGRP-TestSearcherManager] >> >> Stack Trace: >> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an >> uncaught exception in thread: Thread[id=17, name=Thread-1, state=RUNNABLE, >> group=TGRP-TestSearcherManager] >> Caused by: java.lang.RuntimeException: >> java.nio.file.FileAlreadyExistsException: >> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J2/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt >> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) >> at >> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) >> Caused by: java.nio.file.FileAlreadyExistsException: >> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J2/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt >> at >> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) >> at >> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) >> at >> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) >> at >> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) >> at >> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) >> at >> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) >> at >> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) >> at >> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) >> at >> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) >> at >> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) >> at >> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) >> at java.base/java.nio.file.Files.newOutputStream(Files.java:218) >> at >> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) >> at >> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) >> at >> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) >> at >> org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) >> at >> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) >> at >> org.apache.lucene.store.TrackingDirectoryWr
[jira] [Created] (LUCENE-8275) Unwrap directory to check for FSDirectory
Simon Willnauer created LUCENE-8275: --- Summary: Unwrap directory to check for FSDirectory Key: LUCENE-8275 URL: https://issues.apache.org/jira/browse/LUCENE-8275 Project: Lucene - Core Issue Type: Bug Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) Attachments: LUCENE-8275.patch IndexWriter checks in it's ctor if the incoming directory is an FSDirectory. If that is the case it ensures that the directory retries deleting it's pending deletes and if there are pending deletes it will fail creating the writer. Yet, this check didn't unwrap filter directories such that in the case of MDW we never checked for pending deletes. There are also two places in FSDirectory that first removed the file that was supposed to be created / renamed to from the pending deletes set and then tried to clean up pending deletes which excluded the file. These places now remove the file from the set after the pending deletes are checked. This caused some test failures lately unfortunately very timing dependent: {noformat} FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager Error Message: Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, state=RUNNABLE, group=TGRP-TestSearcherManager] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, state=RUNNABLE, group=TGRP-TestSearcherManager] Caused by: java.lang.RuntimeException: java.nio.file.FileAlreadyExistsException: /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) at org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) Caused by: java.nio.file.FileAlreadyExistsException: /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) at org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) at org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) at org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) at org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) at org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) at org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) at java.base/java.nio.file.Files.newOutputStream(Files.java:218) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) at org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) at org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:48) at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:39) at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:46) at org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:363) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:399
[jira] [Updated] (LUCENE-8275) Unwrap directory to check for FSDirectory
[ https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8275: Attachment: LUCENE-8275.patch > Unwrap directory to check for FSDirectory > - > > Key: LUCENE-8275 > URL: https://issues.apache.org/jira/browse/LUCENE-8275 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8275.patch > > > IndexWriter checks in it's ctor if the incoming directory is an > FSDirectory. If that is the case it ensures that the directory retries > deleting it's pending deletes and if there are pending deletes it will > fail creating the writer. Yet, this check didn't unwrap filter directories > such that in the case of MDW we never checked for pending deletes. > There are also two places in FSDirectory that first removed the file > that was supposed to be created / renamed to from the pending deletes set > and then tried to clean up pending deletes which excluded the file. These > places now remove the file from the set after the pending deletes are > checked. > > This caused some test failures lately unfortunately very timing dependent: > > {noformat} > FAILED: > junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager > Error Message: > Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Stack Trace: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1567, name=Thread-1363, > state=RUNNABLE, group=TGRP-TestSearcherManager] > Caused by: java.lang.RuntimeException: > java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) > at > org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) > Caused by: java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) > at > java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at java.base/java.nio.file.Files.newOutputStream(Files.java:218) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.f
Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-10) - Build # 21897 - Unstable!
I am looking into this On Tue, Apr 24, 2018 at 5:37 PM, Policeman Jenkins Serverwrote: > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/21897/ > Java: 64bit/jdk-10 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC > > 6 tests failed. > FAILED: > junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager > > Error Message: > Suite timeout exceeded (>= 720 msec). > > Stack Trace: > java.lang.Exception: Suite timeout exceeded (>= 720 msec). > at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) > > > FAILED: > junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager > > Error Message: > Captured an uncaught exception in thread: Thread[id=17, name=Thread-1, > state=RUNNABLE, group=TGRP-TestSearcherManager] > > Stack Trace: > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=17, name=Thread-1, state=RUNNABLE, > group=TGRP-TestSearcherManager] > Caused by: java.lang.RuntimeException: > java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J2/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0) > at > org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590) > Caused by: java.nio.file.FileAlreadyExistsException: > /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J2/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) > at > java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) > at > java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129) > at > org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197) > at java.base/java.nio.file.Files.newOutputStream(Files.java:218) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665) > at > org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44) > at > org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128) > at > org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183) > at > org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:48) > at > org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:39) > at > org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:46) > at > org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:363) > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:399) > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:490) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1518) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1210) > at > org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:574) > > > FAILED: >
[jira] [Resolved] (LUCENE-8271) Remove IndexWriter from DWFlushQueue
[ https://issues.apache.org/jira/browse/LUCENE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8271. - Resolution: Fixed > Remove IndexWriter from DWFlushQueue > - > > Key: LUCENE-8271 > URL: https://issues.apache.org/jira/browse/LUCENE-8271 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8271.patch > > > This simplifies DocumentsWriterFlushQueue by moving all IW related > code out of it. The DWFQ now only contains logic for taking tickets > off the queue and applying it to a given consumer. The logic now > entirely resides in IW and has private visitiliby. Locking > also is more contained since IW knows exactly what is called and when. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8272) Share internal DV update code between binary and numeric
[ https://issues.apache.org/jira/browse/LUCENE-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449856#comment-16449856 ] Simon Willnauer commented on LUCENE-8272: - [https://github.com/s1monw/lucene-solr/pull/15] /cc [~mikemccand] > Share internal DV update code between binary and numeric > > > Key: LUCENE-8272 > URL: https://issues.apache.org/jira/browse/LUCENE-8272 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8272.patch > > > Today we duplicate a fair portion of the internal logic to > apply updates of binary and numeric doc values. This change refactors > this non-trivial code to share the same code path and only differ in > if we provide a binary or numeric instance. This also allows us to > iterator over the updates only once rather than twice once for numeric > and once for binary fields. > > This change also subclass DocValuesIterator from > DocValuesFieldUpdates.Iterator > which allows easier consumption down the road since it now shares most of > it's > interface with DocIdSetIterator which is the main interface for this in > Lucene. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8272) Share internal DV update code between binary and numeric
[ https://issues.apache.org/jira/browse/LUCENE-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8272: Attachment: LUCENE-8272.patch > Share internal DV update code between binary and numeric > > > Key: LUCENE-8272 > URL: https://issues.apache.org/jira/browse/LUCENE-8272 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8272.patch > > > Today we duplicate a fair portion of the internal logic to > apply updates of binary and numeric doc values. This change refactors > this non-trivial code to share the same code path and only differ in > if we provide a binary or numeric instance. This also allows us to > iterator over the updates only once rather than twice once for numeric > and once for binary fields. > > This change also subclass DocValuesIterator from > DocValuesFieldUpdates.Iterator > which allows easier consumption down the road since it now shares most of > it's > interface with DocIdSetIterator which is the main interface for this in > Lucene. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8272) Share internal DV update code between binary and numeric
Simon Willnauer created LUCENE-8272: --- Summary: Share internal DV update code between binary and numeric Key: LUCENE-8272 URL: https://issues.apache.org/jira/browse/LUCENE-8272 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) Attachments: LUCENE-8272.patch Today we duplicate a fair portion of the internal logic to apply updates of binary and numeric doc values. This change refactors this non-trivial code to share the same code path and only differ in if we provide a binary or numeric instance. This also allows us to iterator over the updates only once rather than twice once for numeric and once for binary fields. This change also subclass DocValuesIterator from DocValuesFieldUpdates.Iterator which allows easier consumption down the road since it now shares most of it's interface with DocIdSetIterator which is the main interface for this in Lucene. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments
[ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449666#comment-16449666 ] Simon Willnauer commented on LUCENE-8264: - to be absolutely honest I was surprised by this as well. I think the reasons behind this change make sense to me but the implications are big. I am not sure if the strictness here comes only from the broken TermVectors offsets or not but if so can we discuss relaxing this a bit. This change hit a couple of committers by surprise (including myself) and I wonder if we can take a step back and reiterate on this decision? While there are a bunch or other issues when you for instance go from 3.x to 7.x like your tokenization / analysis chain isn't supported anymore etc. there are valid usecases for ugrading your index via background merges rewriting the index format. The issues like unsupported analysis chains should be handled by highler level apps like solr or es. Like there are tons of people that use lucene as a retrieval engine doing very simple whitespace tokenization, a merge from 3.x to 7.x might be just fine? I think it would be good to have the conversation again even though the changes were communicated very openly. [~jpountz] [~thetaphi] [~rcmuir] [~mikemccand] [~dweiss] WDYT? > Allow an option to rewrite all segments > --- > > Key: LUCENE-8264 > URL: https://issues.apache.org/jira/browse/LUCENE-8264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > For the background, see SOLR-12259. > There are several use-cases that would be much easier, especially during > upgrades, if we could specify that all segments get rewritten. > One example: Upgrading 5x->6x->7x. When segments are merged, they're > rewritten into the current format. However, there's no guarantee that a > particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily > be successful. > How many merge policies support this is an open question. I propose to start > with TMP and raise other JIRAs as necessary for other merge policies. > So far the usual response has been "re-index from scratch", but that's > increasingly difficult as systems get larger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8271) Remove IndexWriter from DWFlushQueue
[ https://issues.apache.org/jira/browse/LUCENE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449429#comment-16449429 ] Simon Willnauer commented on LUCENE-8271: - /cc [~mikemccand] [~dweiss] https://github.com/s1monw/lucene-solr/pull/14 > Remove IndexWriter from DWFlushQueue > - > > Key: LUCENE-8271 > URL: https://issues.apache.org/jira/browse/LUCENE-8271 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8271.patch > > > This simplifies DocumentsWriterFlushQueue by moving all IW related > code out of it. The DWFQ now only contains logic for taking tickets > off the queue and applying it to a given consumer. The logic now > entirely resides in IW and has private visitiliby. Locking > also is more contained since IW knows exactly what is called and when. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8271) Remove IndexWriter from DWFlushQueue
[ https://issues.apache.org/jira/browse/LUCENE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8271: Attachment: LUCENE-8271.patch > Remove IndexWriter from DWFlushQueue > - > > Key: LUCENE-8271 > URL: https://issues.apache.org/jira/browse/LUCENE-8271 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8271.patch > > > This simplifies DocumentsWriterFlushQueue by moving all IW related > code out of it. The DWFQ now only contains logic for taking tickets > off the queue and applying it to a given consumer. The logic now > entirely resides in IW and has private visitiliby. Locking > also is more contained since IW knows exactly what is called and when. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8271) Remove IndexWriter from DWFlushQueue
Simon Willnauer created LUCENE-8271: --- Summary: Remove IndexWriter from DWFlushQueue Key: LUCENE-8271 URL: https://issues.apache.org/jira/browse/LUCENE-8271 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) This simplifies DocumentsWriterFlushQueue by moving all IW related code out of it. The DWFQ now only contains logic for taking tickets off the queue and applying it to a given consumer. The logic now entirely resides in IW and has private visitiliby. Locking also is more contained since IW knows exactly what is called and when. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8269) Detach downstream classes from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8269. - Resolution: Fixed > Detach downstream classes from IndexWriter > -- > > Key: LUCENE-8269 > URL: https://issues.apache.org/jira/browse/LUCENE-8269 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8269.patch > > > IndexWriter today is shared with many classes like BufferedUpdateStream, > DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks > on the writer instance or assert that the current thread doesn't hold a lock. > This makes it very difficult to have a manageable threading model. > > This change separates out the IndexWriter from those classes and makes > them all > independent of IW. IW now implements a new interface for DocumentsWriter > to communicate > on failed or successful flushes and tragic events. This allows IW to make > it's critical > methods private and execute all lock critical actions on it's private > queue that ensures > that the IW lock is not held. Follow-up changes will try to detach more > code like > publishing flushed segments to ensure we never call back into IW in an > uncontrolled way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448287#comment-16448287 ] Simon Willnauer commented on LUCENE-8267: - +1 to what [~rcmuir] said so many more efficient options {quote}Do you mean to say I should have said all I said without voting first? Lets have a conversation! (we _are_ having a conversation){quote} So I perceive your veto as an aggressive step. To me it's a last resort after we can't find a solution that is good for all of us. The conversation already has a tone that is not appropriate and could have been prevented by formulating objections as questions. like _I am using this postings format in X and it's serving well, what are the alternatives._ - I am sure you would have got an awesome answer. {quote}I don't understand this point of view; can you please elaborate? Fear of what?{quote} if you can't remove stuff without others jumping in vetoing the reaction will be to prevent additions in the same way due to _fear_ created by the veto. This is a terrible place to be in, we have seen this in the past we should prevent it. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448208#comment-16448208 ] Simon Willnauer edited comment on LUCENE-8267 at 4/23/18 2:19 PM: -- {quote}If we are going to make it harder to remove stuff, I have no problem being the one to make it equally harder to add stuff. {quote} I agree this is one of these issues that we have to face. if we put the bar very high to remove stuff that is not mainstream then we will have a super hard time adding stuff. It creates fear driven decisions. It sucks I agree with [~rcmuir] 100% here. {quote} -1 sorry. I've used the MemoryPostingsFormat for a text-tagging use-case where there are intense lookups against the terms dictionary. It's highly beneficial to have the terms dictionary be entirely memory resident, albeit in a compact FST. The issue description mentions "We don't use those memory codecs anywhere outside of tests" – this should be no surprise as it's not the default codec. I'm sure it may be hard to gauge the level of use of something outside of core-Lucene. When we ponder removing something that Lucene doesn't even _need_, I propose we raise the issue more openly to the community. Perhaps the question could be proposed in CHANGES.txt and/or release announcements to solicit community input? {quote} given that you know that you are using your veto here we are already in a terrible position to have any conversation. Can you quantify the "it's nice"? since there are alternatives that (standard codec) can you go and provide some numbers. We should not use vetos based on non-quantifiable arguments IMO. We can go and ask the community but I don't expect much useful outcome, most of the folks don't know what they are using here and there. Nevertheless, I am happy to send a mail to dev to get this information. was (Author: simonw): {quote} If we are going to make it harder to remove stuff, I have no problem being the one to make it equally harder to add stuff. \{quote} I agree this is one of these issues that we have to face. if we put the bar very high to remove stuff that is not mainstream then we will have a super hard time adding stuff. It creates fear driven decisions. It sucks I agree with [~rcmuir] 100% here. {quote} -1 sorry. I've used the MemoryPostingsFormat for a text-tagging use-case where there are intense lookups against the terms dictionary. It's highly beneficial to have the terms dictionary be entirely memory resident, albeit in a compact FST. The issue description mentions "We don't use those memory codecs anywhere outside of tests" – this should be no surprise as it's not the default codec. I'm sure it may be hard to gauge the level of use of something outside of core-Lucene. When we ponder removing something that Lucene doesn't even _need_, I propose we raise the issue more openly to the community. Perhaps the question could be proposed in CHANGES.txt and/or release announcements to solicit community input? {quote} given that you know that you are using your veto here we are already in a terrible position to have any conversation. Can you quantify the "it's nice"? since there are alternatives that (standard codec) can you go and provide some numbers. We should not use vetos based on non-quantifiable arguments IMO. We can go and ask the community but I don't expect much useful outcome, most of the folks don't know what they are using here and there. Nevertheless, I am happy to send a mail to dev to get this information. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448208#comment-16448208 ] Simon Willnauer commented on LUCENE-8267: - {quote} If we are going to make it harder to remove stuff, I have no problem being the one to make it equally harder to add stuff. \{quote} I agree this is one of these issues that we have to face. if we put the bar very high to remove stuff that is not mainstream then we will have a super hard time adding stuff. It creates fear driven decisions. It sucks I agree with [~rcmuir] 100% here. {quote} -1 sorry. I've used the MemoryPostingsFormat for a text-tagging use-case where there are intense lookups against the terms dictionary. It's highly beneficial to have the terms dictionary be entirely memory resident, albeit in a compact FST. The issue description mentions "We don't use those memory codecs anywhere outside of tests" – this should be no surprise as it's not the default codec. I'm sure it may be hard to gauge the level of use of something outside of core-Lucene. When we ponder removing something that Lucene doesn't even _need_, I propose we raise the issue more openly to the community. Perhaps the question could be proposed in CHANGES.txt and/or release announcements to solicit community input? {quote} given that you know that you are using your veto here we are already in a terrible position to have any conversation. Can you quantify the "it's nice"? since there are alternatives that (standard codec) can you go and provide some numbers. We should not use vetos based on non-quantifiable arguments IMO. We can go and ask the community but I don't expect much useful outcome, most of the folks don't know what they are using here and there. Nevertheless, I am happy to send a mail to dev to get this information. > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8269) Detach downstream classes from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448026#comment-16448026 ] Simon Willnauer commented on LUCENE-8269: - [https://github.com/s1monw/lucene-solr/pull/13/] /cc [~mikemccand] > Detach downstream classes from IndexWriter > -- > > Key: LUCENE-8269 > URL: https://issues.apache.org/jira/browse/LUCENE-8269 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8269.patch > > > IndexWriter today is shared with many classes like BufferedUpdateStream, > DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks > on the writer instance or assert that the current thread doesn't hold a lock. > This makes it very difficult to have a manageable threading model. > > This change separates out the IndexWriter from those classes and makes > them all > independent of IW. IW now implements a new interface for DocumentsWriter > to communicate > on failed or successful flushes and tragic events. This allows IW to make > it's critical > methods private and execute all lock critical actions on it's private > queue that ensures > that the IW lock is not held. Follow-up changes will try to detach more > code like > publishing flushed segments to ensure we never call back into IW in an > uncontrolled way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8269) Detach downstream classes from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8269: Attachment: LUCENE-8269.patch > Detach downstream classes from IndexWriter > -- > > Key: LUCENE-8269 > URL: https://issues.apache.org/jira/browse/LUCENE-8269 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8269.patch > > > IndexWriter today is shared with many classes like BufferedUpdateStream, > DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks > on the writer instance or assert that the current thread doesn't hold a lock. > This makes it very difficult to have a manageable threading model. > > This change separates out the IndexWriter from those classes and makes > them all > independent of IW. IW now implements a new interface for DocumentsWriter > to communicate > on failed or successful flushes and tragic events. This allows IW to make > it's critical > methods private and execute all lock critical actions on it's private > queue that ensures > that the IW lock is not held. Follow-up changes will try to detach more > code like > publishing flushed segments to ensure we never call back into IW in an > uncontrolled way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8269) Detach downstream classes from IndexWriter
Simon Willnauer created LUCENE-8269: --- Summary: Detach downstream classes from IndexWriter Key: LUCENE-8269 URL: https://issues.apache.org/jira/browse/LUCENE-8269 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) IndexWriter today is shared with many classes like BufferedUpdateStream, DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks on the writer instance or assert that the current thread doesn't hold a lock. This makes it very difficult to have a manageable threading model. This change separates out the IndexWriter from those classes and makes them all independent of IW. IW now implements a new interface for DocumentsWriter to communicate on failed or successful flushes and tragic events. This allows IW to make it's critical methods private and execute all lock critical actions on it's private queue that ensures that the IW lock is not held. Follow-up changes will try to detach more code like publishing flushed segments to ensure we never call back into IW in an uncontrolled way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8268) MatchesIterator.term() should return an array
[ https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447948#comment-16447948 ] Simon Willnauer commented on LUCENE-8268: - {quote} So at the moment there isn't anything that actually uses this. My reason for adding it was to make it possible to identify the leaf query that returned each position, but maybe it would be a better idea to remove terms() entirely, and add a getLeafQuery() method instead? {quote} hard to tell since I don't know the API well enough. But if this is the purpose, I agree. > MatchesIterator.term() should return an array > - > > Key: LUCENE-8268 > URL: https://issues.apache.org/jira/browse/LUCENE-8268 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8268.patch > > > At the moment, we return a single BytesRef from MatchesIterator.term(), which > works well for the queries that currently implement this. This won't be > enough for queries that operate on more than one term, however, such as > phrase or Span queries. > In preparation for LUCENE-8249, this issue will change the method to return > an array of BytesRef -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase
[ https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447787#comment-16447787 ] Simon Willnauer commented on LUCENE-8267: - +1 > Remove memory codecs from the codebase > -- > > Key: LUCENE-8267 > URL: https://issues.apache.org/jira/browse/LUCENE-8267 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random > selection of codecs for tests and cause occasional OOMs when a test with huge > data is selected. We don't use those memory codecs anywhere outside of tests, > it has been suggested to just remove them to avoid maintenance costs and OOMs > in tests. [1] > [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8268) MatchesIterator.term() should return an array
[ https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447779#comment-16447779 ] Simon Willnauer commented on LUCENE-8268: - a couple of questions: * in _compareBytesRefArrays_ how can you tell that comparing each individual term is correct? * is _BytesRefIterator_ an option as a return value and would it make sense. It's hart do tell without a single user of this. * In the current context there is no gain changing this interface. Can we add a users of multiple terms? > MatchesIterator.term() should return an array > - > > Key: LUCENE-8268 > URL: https://issues.apache.org/jira/browse/LUCENE-8268 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8268.patch > > > At the moment, we return a single BytesRef from MatchesIterator.term(), which > works well for the queries that currently implement this. This won't be > enough for queries that operate on more than one term, however, such as > phrase or Span queries. > In preparation for LUCENE-8249, this issue will change the method to return > an array of BytesRef -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments
[ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447767#comment-16447767 ] Simon Willnauer commented on LUCENE-8264: - > It worked at least until 7.x. As I said, you can remove offsets if needed. > And of course a FilterLeafReader together with SlowCodecReaderWrapper is > definitly needed. I am not so sure about this, at least [this|https://github.com/apache/lucene-solr/blob/branch_7x/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2756] will fail and it's in there since 7.0 > Allow an option to rewrite all segments > --- > > Key: LUCENE-8264 > URL: https://issues.apache.org/jira/browse/LUCENE-8264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > For the background, see SOLR-12259. > There are several use-cases that would be much easier, especially during > upgrades, if we could specify that all segments get rewritten. > One example: Upgrading 5x->6x->7x. When segments are merged, they're > rewritten into the current format. However, there's no guarantee that a > particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily > be successful. > How many merge policies support this is an open question. I propose to start > with TMP and raise other JIRAs as necessary for other merge policies. > So far the usual response has been "re-index from scratch", but that's > increasingly difficult as systems get larger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8260) Extract ReaderPool from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-8260. - Resolution: Fixed thanks everyone! > Extract ReaderPool from IndexWriter > > > Key: LUCENE-8260 > URL: https://issues.apache.org/jira/browse/LUCENE-8260 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8260.diff > > > ReaderPool plays a central role in the IndexWriter pooling NRT readers and > making sure we write buffered deletes and updates to disk. This class used to > be a non-static inner class accessing many aspects including locks from the > IndexWriter itself. This change moves the class outside of IW and defines > it's responsiblity in a clear way with respect to locks etc. Now IndexWriter > doesn't need to share ReaderPool anymore and reacts on writes done inside the > pool by checkpointing internally. This also removes acquiring the IW lock > inside the reader pool which makes reasoning about concurrency difficult. > This change also add javadocs and dedicated tests for the ReaderPool class. > /cc [~mikemccand] [~dawidweiss] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments
[ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447758#comment-16447758 ] Simon Willnauer commented on LUCENE-8264: - [~thetaphi] I don't think this is going to work here. IndexWriter#validateMergeReader will prevent you from doing this unless you add some evil hacks. > Allow an option to rewrite all segments > --- > > Key: LUCENE-8264 > URL: https://issues.apache.org/jira/browse/LUCENE-8264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > For the background, see SOLR-12259. > There are several use-cases that would be much easier, especially during > upgrades, if we could specify that all segments get rewritten. > One example: Upgrading 5x->6x->7x. When segments are merged, they're > rewritten into the current format. However, there's no guarantee that a > particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily > be successful. > How many merge policies support this is an open question. I propose to start > with TMP and raise other JIRAs as necessary for other merge policies. > So far the usual response has been "re-index from scratch", but that's > increasingly difficult as systems get larger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments
[ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447707#comment-16447707 ] Simon Willnauer commented on LUCENE-8264: - [~dweiss] I think you are not aware of the fact that an index that was created with N-2 won't be supported by N even if you rewrite all segments. The created version is baked into the segments file and Lucene will not open it even if all segments are on N or N-1. There are several reasons for this for instance to reject broken offsets in term vectors in Lucene 7. We can never enforce limits like this if we keep on upgrading stuff behind the scenes that didn't have these protections. > Allow an option to rewrite all segments > --- > > Key: LUCENE-8264 > URL: https://issues.apache.org/jira/browse/LUCENE-8264 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > For the background, see SOLR-12259. > There are several use-cases that would be much easier, especially during > upgrades, if we could specify that all segments get rewritten. > One example: Upgrading 5x->6x->7x. When segments are merged, they're > rewritten into the current format. However, there's no guarantee that a > particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily > be successful. > How many merge policies support this is an open question. I propose to start > with TMP and raise other JIRAs as necessary for other merge policies. > So far the usual response has been "re-index from scratch", but that's > increasingly difficult as systems get larger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8260) Extract ReaderPool from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444082#comment-16444082 ] Simon Willnauer commented on LUCENE-8260: - here is also a review PR https://github.com/s1monw/lucene-solr/pull/12/ > Extract ReaderPool from IndexWriter > > > Key: LUCENE-8260 > URL: https://issues.apache.org/jira/browse/LUCENE-8260 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8260.diff > > > ReaderPool plays a central role in the IndexWriter pooling NRT readers and > making sure we write buffered deletes and updates to disk. This class used to > be a non-static inner class accessing many aspects including locks from the > IndexWriter itself. This change moves the class outside of IW and defines > it's responsiblity in a clear way with respect to locks etc. Now IndexWriter > doesn't need to share ReaderPool anymore and reacts on writes done inside the > pool by checkpointing internally. This also removes acquiring the IW lock > inside the reader pool which makes reasoning about concurrency difficult. > This change also add javadocs and dedicated tests for the ReaderPool class. > /cc [~mikemccand] [~dawidweiss] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8260) Extract ReaderPool from IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-8260: Attachment: LUCENE-8260.diff > Extract ReaderPool from IndexWriter > > > Key: LUCENE-8260 > URL: https://issues.apache.org/jira/browse/LUCENE-8260 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 7.4, master (8.0) > Reporter: Simon Willnauer >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: LUCENE-8260.diff > > > ReaderPool plays a central role in the IndexWriter pooling NRT readers and > making sure we write buffered deletes and updates to disk. This class used to > be a non-static inner class accessing many aspects including locks from the > IndexWriter itself. This change moves the class outside of IW and defines > it's responsiblity in a clear way with respect to locks etc. Now IndexWriter > doesn't need to share ReaderPool anymore and reacts on writes done inside the > pool by checkpointing internally. This also removes acquiring the IW lock > inside the reader pool which makes reasoning about concurrency difficult. > This change also add javadocs and dedicated tests for the ReaderPool class. > /cc [~mikemccand] [~dawidweiss] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8259) Extract ReaderPool from IndexWriter
Simon Willnauer created LUCENE-8259: --- Summary: Extract ReaderPool from IndexWriter Key: LUCENE-8259 URL: https://issues.apache.org/jira/browse/LUCENE-8259 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) Attachments: extract_reader_pool.diff ReaderPool plays a central role in the IndexWriter pooling NRT readers and making sure we write buffered deletes and updates to disk. This class used to be a non-static inner class accessing many aspects including locks from the IndexWriter itself. This change moves the class outside of IW and defines it's responsiblity in a clear way with respect to locks etc. Now IndexWriter doesn't need to share ReaderPool anymore and reacts on writes done inside the pool by checkpointing internally. This also removes acquiring the IW lock inside the reader pool which makes reasoning about concurrency difficult. This change also add javadocs and dedicated tests for the ReaderPool class. /cc [~mikemccand] [~dawidweiss] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8260) Extract ReaderPool from IndexWriter
Simon Willnauer created LUCENE-8260: --- Summary: Extract ReaderPool from IndexWriter Key: LUCENE-8260 URL: https://issues.apache.org/jira/browse/LUCENE-8260 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.4, master (8.0) Reporter: Simon Willnauer Fix For: 7.4, master (8.0) Attachments: LUCENE-8260.diff ReaderPool plays a central role in the IndexWriter pooling NRT readers and making sure we write buffered deletes and updates to disk. This class used to be a non-static inner class accessing many aspects including locks from the IndexWriter itself. This change moves the class outside of IW and defines it's responsiblity in a clear way with respect to locks etc. Now IndexWriter doesn't need to share ReaderPool anymore and reacts on writes done inside the pool by checkpointing internally. This also removes acquiring the IW lock inside the reader pool which makes reasoning about concurrency difficult. This change also add javadocs and dedicated tests for the ReaderPool class. /cc [~mikemccand] [~dawidweiss] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-7.x-Linux (32bit/jdk1.8.0_162) - Build # 1744 - Failure!
pushed a fix, test bug - sorry for the noise On Wed, Apr 18, 2018 at 12:47 PM, Policeman Jenkins Serverwrote: > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/1744/ > Java: 32bit/jdk1.8.0_162 -server -XX:+UseG1GC > > 1 tests failed. > FAILED: > org.apache.lucene.index.TestPendingSoftDeletes.testUpdateAppliedOnlyOnce > > Error Message: > expected:<1> but was:<2> > > Stack Trace: > java.lang.AssertionError: expected:<1> but was:<2> > at > __randomizedtesting.SeedInfo.seed([534121D77EFDCB0A:7E638ADEA75CBAAB]:0) > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at org.junit.Assert.assertEquals(Assert.java:456) > at > org.apache.lucene.index.TestPendingSoftDeletes.testUpdateAppliedOnlyOnce(TestPendingSoftDeletes.java:170) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at java.lang.Thread.run(Thread.java:748) > > > > > Build Log: > [...truncated 480 lines...] >[junit4] Suite: org.apache.lucene.index.TestPendingSoftDeletes >[junit4] 2> NOTE: reproduce with: ant test