[jira] [Updated] (LUCENE-8310) Relax IWs check on pending deletes

2018-05-15 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8310:

Attachment: LUCENE-8310.patch

> Relax IWs check on pending deletes
> --
>
> Key: LUCENE-8310
> URL: https://issues.apache.org/jira/browse/LUCENE-8310
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8310.patch, LUCENE-8310.patch
>
>
> I recently fixed the check in IW to fail if there are pending deletes. After 
> upgrading to a snapshot I realized the consequences of this check. It made 
> most of our usage of IW to for instance prepare commit metadata, rollback to 
> safe commit-points etc. impossible since we have to now busy wait on top of 
> directory util all deletes are actually gone even though that we can 
> guarantee that our history always goes forward. ie we are truly append-only 
> in the sense of never reusing segment generations. The fix that I made was 
> basically return false from a _checkPendingDeletions_ in a directory wrapper 
> to work around it.
> I do expect this to happen to a lot of lucene users even if they use IW 
> correctly. My proposal is to make the check in IW a bit more sophisticated 
> and only fail if there are pending deletes that are in the future from a 
> generation perspective. We really don't care about files from the past. My 
> patch checks the segment generation of each pending file which is safe since 
> that is the same procedure we apply in IndexFileDeleter to inc reference etc. 
> and only fail if the pending delete is in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8310) Relax IWs check on pending deletes

2018-05-15 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8310:

Attachment: LUCENE-8310.patch

> Relax IWs check on pending deletes
> --
>
> Key: LUCENE-8310
> URL: https://issues.apache.org/jira/browse/LUCENE-8310
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8310.patch
>
>
> I recently fixed the check in IW to fail if there are pending deletes. After 
> upgrading to a snapshot I realized the consequences of this check. It made 
> most of our usage of IW to for instance prepare commit metadata, rollback to 
> safe commit-points etc. impossible since we have to now busy wait on top of 
> directory util all deletes are actually gone even though that we can 
> guarantee that our history always goes forward. ie we are truly append-only 
> in the sense of never reusing segment generations. The fix that I made was 
> basically return false from a _checkPendingDeletions_ in a directory wrapper 
> to work around it.
> I do expect this to happen to a lot of lucene users even if they use IW 
> correctly. My proposal is to make the check in IW a bit more sophisticated 
> and only fail if there are pending deletes that are in the future from a 
> generation perspective. We really don't care about files from the past. My 
> patch checks the segment generation of each pending file which is safe since 
> that is the same procedure we apply in IndexFileDeleter to inc reference etc. 
> and only fail if the pending delete is in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8310) Relax IWs check on pending deletes

2018-05-15 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475640#comment-16475640
 ] 

Simon Willnauer commented on LUCENE-8310:
-

here is a patch

> Relax IWs check on pending deletes
> --
>
> Key: LUCENE-8310
> URL: https://issues.apache.org/jira/browse/LUCENE-8310
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8310.patch
>
>
> I recently fixed the check in IW to fail if there are pending deletes. After 
> upgrading to a snapshot I realized the consequences of this check. It made 
> most of our usage of IW to for instance prepare commit metadata, rollback to 
> safe commit-points etc. impossible since we have to now busy wait on top of 
> directory util all deletes are actually gone even though that we can 
> guarantee that our history always goes forward. ie we are truly append-only 
> in the sense of never reusing segment generations. The fix that I made was 
> basically return false from a _checkPendingDeletions_ in a directory wrapper 
> to work around it.
> I do expect this to happen to a lot of lucene users even if they use IW 
> correctly. My proposal is to make the check in IW a bit more sophisticated 
> and only fail if there are pending deletes that are in the future from a 
> generation perspective. We really don't care about files from the past. My 
> patch checks the segment generation of each pending file which is safe since 
> that is the same procedure we apply in IndexFileDeleter to inc reference etc. 
> and only fail if the pending delete is in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8310) Relax IWs check on pending deletes

2018-05-15 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8310:
---

 Summary: Relax IWs check on pending deletes
 Key: LUCENE-8310
 URL: https://issues.apache.org/jira/browse/LUCENE-8310
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)


I recently fixed the check in IW to fail if there are pending deletes. After 
upgrading to a snapshot I realized the consequences of this check. It made most 
of our usage of IW to for instance prepare commit metadata, rollback to safe 
commit-points etc. impossible since we have to now busy wait on top of 
directory util all deletes are actually gone even though that we can guarantee 
that our history always goes forward. ie we are truly append-only in the sense 
of never reusing segment generations. The fix that I made was basically return 
false from a _checkPendingDeletions_ in a directory wrapper to work around it.
I do expect this to happen to a lot of lucene users even if they use IW 
correctly. My proposal is to make the check in IW a bit more sophisticated and 
only fail if there are pending deletes that are in the future from a generation 
perspective. We really don't care about files from the past. My patch checks 
the segment generation of each pending file which is safe since that is the 
same procedure we apply in IndexFileDeleter to inc reference etc. and only fail 
if the pending delete is in the future.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8264) Allow an option to rewrite all segments

2018-05-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474131#comment-16474131
 ] 

Simon Willnauer edited comment on LUCENE-8264 at 5/14/18 12:46 PM:
---

[~erickerickson]
 # For N-1 -> N we have _org.apache.lucene.index.UpgradeIndexMergePolicy_ ?
 # In order to add DV I think this should be done by wrapping a codec reader. I 
personally think quite an edge case and should be done in the higher level 
application ie. Solr itself. You can do this quite easily with 
_org.apache.lucene.index.OneMergeWrappingMergePolicy_ similar to what we do in 
the soft delete case in _SoftDeletesRetentionMergePolicy_

do I miss something?

 


was (Author: simonw):
[~erickerickson]

 
 # For N-1 -> N we have _org.apache.lucene.index.UpgradeIndexMergePolicy_ ?
 # In order to add DV I think this should be done by wrapping a codec reader. I 
personally think quite an edge case and should be done in the higher level 
application ie. Solr itself. You can do this quite easily with 
_org.apache.lucene.index.OneMergeWrappingMergePolicy_ similar to what we do in 
the soft delete case in _SoftDeletesRetentionMergePolicy_

do I miss something?

 

> Allow an option to rewrite all segments
> ---
>
> Key: LUCENE-8264
> URL: https://issues.apache.org/jira/browse/LUCENE-8264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during 
> upgrades, if we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're 
> rewritten into the current format. However, there's no guarantee that a 
> particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily 
> be successful.
> How many merge policies support this is an open question. I propose to start 
> with TMP and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's 
> increasingly difficult as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments

2018-05-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474131#comment-16474131
 ] 

Simon Willnauer commented on LUCENE-8264:
-

[~erickerickson]

 
 # For N-1 -> N we have _org.apache.lucene.index.UpgradeIndexMergePolicy_ ?
 # In order to add DV I think this should be done by wrapping a codec reader. I 
personally think quite an edge case and should be done in the higher level 
application ie. Solr itself. You can do this quite easily with 
_org.apache.lucene.index.OneMergeWrappingMergePolicy_ similar to what we do in 
the soft delete case in _SoftDeletesRetentionMergePolicy_

do I miss something?

 

> Allow an option to rewrite all segments
> ---
>
> Key: LUCENE-8264
> URL: https://issues.apache.org/jira/browse/LUCENE-8264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during 
> upgrades, if we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're 
> rewritten into the current format. However, there's no guarantee that a 
> particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily 
> be successful.
> How many merge policies support this is an open question. I propose to start 
> with TMP and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's 
> increasingly difficult as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8307) FileSwitchDirectory.checkPendingDeletions is backward

2018-05-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474017#comment-16474017
 ] 

Simon Willnauer commented on LUCENE-8307:
-

LGTM

> FileSwitchDirectory.checkPendingDeletions is backward
> -
>
> Key: LUCENE-8307
> URL: https://issues.apache.org/jira/browse/LUCENE-8307
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8307.patch
>
>
> It checks that both directories have pending deletions, while this method 
> should return true if there are any files pending deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8298.
-
Resolution: Fixed

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, 
> LUCENE-8298.patch, LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469027#comment-16469027
 ] 

Simon Willnauer commented on LUCENE-8298:
-

> I'd rather like not to add {{Bits#getMutableCopy}} and keep the {{Bits}} API 
> minimal. Otherwise +1.

fair enough. I agree lets keep it clean. I used a static method on FixedBitset 
instead.
 

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, 
> LUCENE-8298.patch, LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8298:

Attachment: LUCENE-8298.patch

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, 
> LUCENE-8298.patch, LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468879#comment-16468879
 ] 

Simon Willnauer commented on LUCENE-8298:
-

[~jpountz] I integrated with your latest changes

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, 
> LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8298:

Attachment: LUCENE-8298.patch

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch, 
> LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8303) Make LiveDocsFormat only responsible for (de)serialization of live docs

2018-05-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468844#comment-16468844
 ] 

Simon Willnauer commented on LUCENE-8303:
-

+1 this looks great. I am unsure if we should deprecate the MutableBits in 7.x 
but other than that go ahead and push.

> Make LiveDocsFormat only responsible for (de)serialization of live docs
> ---
>
> Key: LUCENE-8303
> URL: https://issues.apache.org/jira/browse/LUCENE-8303
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8303.patch
>
>
> We could simplify live docs by only making the format responsible from 
> reading/writing a Bits instance that represents live docs while today the 
> format is also involved to delete documents since it needs to be able to 
> provide mutable bits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-09 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8298:

Attachment: LUCENE-8298.patch

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468781#comment-16468781
 ] 

Simon Willnauer commented on LUCENE-8298:
-

I updated the patch [~jpountz]

 

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch, LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8296) PendingDeletes shouldn't write to live docs that it shared

2018-05-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468516#comment-16468516
 ] 

Simon Willnauer commented on LUCENE-8296:
-

cool LGTM +1 to commit

> PendingDeletes shouldn't write to live docs that it shared
> --
>
> Key: LUCENE-8296
> URL: https://issues.apache.org/jira/browse/LUCENE-8296
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8296.patch
>
>
> PendingDeletes has a markAsShared mechanism that allow to make sure that the 
> latest livedocs are not going to receive more updates. But it is not always 
> used, and I was able to verify that in some cases we end up with readers 
> whose live docs disagree with the number of deletes. Even though this might 
> not be causing bugs, it feels dangerous to me so I think we should consider 
> always marking live docs as shared in #getLiveDocs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-08 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467166#comment-16467166
 ] 

Simon Willnauer commented on LUCENE-8298:
-

new patch with added javadocs, API cleanups and more tests

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-08 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8298:

Attachment: LUCENE-8298.patch

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch, LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-07 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465964#comment-16465964
 ] 

Simon Willnauer commented on LUCENE-8298:
-

I attached a patch for discussion. I need to do some cleanups, more tests and 
clarify javadocs but it shows the idea

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-07 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8298:

Attachment: LUCENE-8298.patch

> Allow DocValues updates to reset a value
> 
>
> Key: LUCENE-8298
> URL: https://issues.apache.org/jira/browse/LUCENE-8298
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8298.patch
>
>
>  Today once a document has a value in a certain DV field this values
> can only be changed but not removed. While resetting / removing a value
> from a field is certainly a corner case it can be used to undelete a
> soft-deleted document unless it's merged away.
> This allows to rollback changes without rolling back to another 
> commitpoint
> or trashing all uncommitted changes. In certain cenarios it can be used to
> "repair" history of documents in distributed systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8298) Allow DocValues updates to reset a value

2018-05-07 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8298:
---

 Summary: Allow DocValues updates to reset a value
 Key: LUCENE-8298
 URL: https://issues.apache.org/jira/browse/LUCENE-8298
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)


 Today once a document has a value in a certain DV field this values
can only be changed but not removed. While resetting / removing a value
from a field is certainly a corner case it can be used to undelete a
soft-deleted document unless it's merged away.
This allows to rollback changes without rolling back to another commitpoint
or trashing all uncommitted changes. In certain cenarios it can be used to
"repair" history of documents in distributed systems.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8297) Add IW#tryUpdateDocValues(Reader, int, Fields...)

2018-05-07 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8297.
-
Resolution: Fixed

> Add IW#tryUpdateDocValues(Reader, int, Fields...)
> -
>
> Key: LUCENE-8297
> URL: https://issues.apache.org/jira/browse/LUCENE-8297
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8297.patch
>
>
> IndexWriter can update doc values for a specific term but this might
> affect all documents containing the term. With tryUpdateDocValues
> users can update doc-values fields for individual documents. This allows
> for instance to soft-delete individual documents.
> The new method shares most of it's code with tryDeleteDocuments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-05-07 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465852#comment-16465852
 ] 

Simon Willnauer commented on LUCENE-7976:
-

> [~erickerickson] if you index with a single thread, and `commit()` at the 
>right times you can build a precise set of segments and then directly test 
>TMP's behavior.  I like approach one since it then gives you full 
>deterministic control to enumerate the different tricky cases that surface in 
>real indices?

 

I really think we should start working towards testing this as real unittest. 
Creating stuff with IW and depending on it is a big issue. We can change the 
code to be less dependent on IW. I think we should and we should do it before 
making significant changes to MPs IMO

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8297) Add IW#tryUpdateDocValues(Reader, int, Fields...)

2018-05-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465206#comment-16465206
 ] 

Simon Willnauer commented on LUCENE-8297:
-

[~mikemccand] can you take a look

> Add IW#tryUpdateDocValues(Reader, int, Fields...)
> -
>
> Key: LUCENE-8297
> URL: https://issues.apache.org/jira/browse/LUCENE-8297
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8297.patch
>
>
> IndexWriter can update doc values for a specific term but this might
> affect all documents containing the term. With tryUpdateDocValues
> users can update doc-values fields for individual documents. This allows
> for instance to soft-delete individual documents.
> The new method shares most of it's code with tryDeleteDocuments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8297) Add IW#tryUpdateDocValues(Reader, int, Fields...)

2018-05-06 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8297:

Attachment: LUCENE-8297.patch

> Add IW#tryUpdateDocValues(Reader, int, Fields...)
> -
>
> Key: LUCENE-8297
> URL: https://issues.apache.org/jira/browse/LUCENE-8297
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8297.patch
>
>
> IndexWriter can update doc values for a specific term but this might
> affect all documents containing the term. With tryUpdateDocValues
> users can update doc-values fields for individual documents. This allows
> for instance to soft-delete individual documents.
> The new method shares most of it's code with tryDeleteDocuments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8297) Add IW#tryUpdateDocValues(Reader, int, Fields...)

2018-05-06 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8297:
---

 Summary: Add IW#tryUpdateDocValues(Reader, int, Fields...)
 Key: LUCENE-8297
 URL: https://issues.apache.org/jira/browse/LUCENE-8297
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)


IndexWriter can update doc values for a specific term but this might
affect all documents containing the term. With tryUpdateDocValues
users can update doc-values fields for individual documents. This allows
for instance to soft-delete individual documents.
The new method shares most of it's code with tryDeleteDocuments.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8296) PendingDeletes shouldn't write to live docs that it shared

2018-05-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465190#comment-16465190
 ] 

Simon Willnauer commented on LUCENE-8296:
-

 I think this is mostly a relict from before I started refactoring 
ReadersAndUpdates. I would love to even go further and down the road make the 
returned Bits instance immutable. I think we should have a very very simple 
base class that FixedBitSet can extend that knows how to read from the array. 
This way we know nobody ever mutates it. Today you can just cast the liveDocs 
from a NRT reader and change it's private instance. I am going to look into 
this unless anybody beats me.

One thing that I am feel is missing is an explicit test that the returned bits 
don't change in subsequent modifications.

+1 to the change!

 

 

> PendingDeletes shouldn't write to live docs that it shared
> --
>
> Key: LUCENE-8296
> URL: https://issues.apache.org/jira/browse/LUCENE-8296
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8296.patch
>
>
> PendingDeletes has a markAsShared mechanism that allow to make sure that the 
> latest livedocs are not going to receive more updates. But it is not always 
> used, and I was able to verify that in some cases we end up with readers 
> whose live docs disagree with the number of deletes. Even though this might 
> not be causing bugs, it feels dangerous to me so I think we should consider 
> always marking live docs as shared in #getLiveDocs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-05-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463704#comment-16463704
 ] 

Simon Willnauer edited comment on LUCENE-7976 at 5/4/18 11:31 AM:
--

Eric thanks for tackling this big issue here!

here are a couple comments:

* please remove the commented part that refers to // TODO: See LUCENE-8263
* Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_
* can you make  _SegmentSizeAndDocs_ static and maybe a simple struct ie. no 
getters and don't pass IW to it
* can we assert that _int totalMaxDocs_ is always positive. I know we don't 
allow that many documents in an index but I think it would be good to have an 
extra check.
* can we name _maxMergeAtOnceThisMerge_  _currentMaxMergeAtOnce_ or maybe just 
_maxMergeAtOnce_

I got down this quite a bit and I am starting to question if we should really 
try to change the algorithm that we have today or if this class needs cleanup 
and refactorings first. I am sorry to come in late here but this is a very very 
complex piece of code and adding more complexity to it will rather do harm. 
That said, I wonder if we can generalize the algorithm here into a single 
method because in the end they all do the same thing. We can for instance make 
the selection alg pluggable with a func we pass in and that way differentiate 
between findMerges and findForceMerge etc. At the end of the day we want them 
all to work in the same way. I am not saying we should go down all that way but 
maybe we can extract a common code path that we can share between the places 
were we filter out the segments that are not eligible. 

This is just a suggestion, I am happy to help here btw. One thing that concerns 
me and is in-fact a showstopper IMO is that the patch doesn't have a single 
test that ensures it's correct. I mean we significantly change the behavior I 
think it warrants tests no?


was (Author: simonw):
Eric thanks for tackling this big issue here!

here are a couple comments:

* please remove the commented part that refers to // TODO: See LUCENE-8263
* Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_
* can you make  _SegmentSizeAndDocs_ static and maybe a simple struct ie. no 
getters and don't pass IW to it
* can we assert that _int totalMaxDocs_ is always positive. I know we don't 
allow that many documents in an index but I think it would be good to have an 
extra check.
* can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just 
_ maxMergeAtOnce_

I got down this quite a bit and I am starting to question if we should really 
try to change the algorithm that we have today or if this class needs cleanup 
and refactorings first. I am sorry to come in late here but this is a very very 
complex piece of code and adding more complexity to it will rather do harm. 
That said, I wonder if we can generalize the algorithm here into a single 
method because in the end they all do the same thing. We can for instance make 
the selection alg pluggable with a func we pass in and that way differentiate 
between findMerges and findForceMerge etc. At the end of the day we want them 
all to work in the same way. I am not saying we should go down all that way but 
maybe we can extract a common code path that we can share between the places 
were we filter out the segments that are not eligible. 

This is just a suggestion, I am happy to help here btw. One thing that concerns 
me and is in-fact a showstopper IMO is that the patch doesn't have a single 
test that ensures it's correct. I mean we significantly change the behavior I 
think it warrants tests no?

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
&

[jira] [Comment Edited] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-05-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463704#comment-16463704
 ] 

Simon Willnauer edited comment on LUCENE-7976 at 5/4/18 11:30 AM:
--

Eric thanks for tackling this big issue here!

here are a couple comments:

* please remove the commented part that refers to // TODO: See LUCENE-8263
* Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_
* can you make  _SegmentSizeAndDocs_ static and maybe a simple struct ie. no 
getters and don't pass IW to it
* can we assert that _int totalMaxDocs_ is always positive. I know we don't 
allow that many documents in an index but I think it would be good to have an 
extra check.
* can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just 
_ maxMergeAtOnce_

I got down this quite a bit and I am starting to question if we should really 
try to change the algorithm that we have today or if this class needs cleanup 
and refactorings first. I am sorry to come in late here but this is a very very 
complex piece of code and adding more complexity to it will rather do harm. 
That said, I wonder if we can generalize the algorithm here into a single 
method because in the end they all do the same thing. We can for instance make 
the selection alg pluggable with a func we pass in and that way differentiate 
between findMerges and findForceMerge etc. At the end of the day we want them 
all to work in the same way. I am not saying we should go down all that way but 
maybe we can extract a common code path that we can share between the places 
were we filter out the segments that are not eligible. 

This is just a suggestion, I am happy to help here btw. One thing that concerns 
me and is in-fact a showstopper IMO is that the patch doesn't have a single 
test that ensures it's correct. I mean we significantly change the behavior I 
think it warrants tests no?


was (Author: simonw):
here are a couple comments:

* please remove the commented part that refers to // TODO: See LUCENE-8263
* Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_
* can you make  _SegmentSizeAndDocs_ static and maybe a simple struct ie. no 
getters and don't pass IW to it
* can we assert that _int totalMaxDocs_ is always positive. I know we don't 
allow that many documents in an index but I think it would be good to have an 
extra check.
* can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just 
_ maxMergeAtOnce_

~~ work in progress ~~ I fat-fingered the save button

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exi

[jira] [Comment Edited] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-05-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463704#comment-16463704
 ] 

Simon Willnauer edited comment on LUCENE-7976 at 5/4/18 10:54 AM:
--

here are a couple comments:

* please remove the commented part that refers to // TODO: See LUCENE-8263
* Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_
* can you make  _SegmentSizeAndDocs_ static and maybe a simple struct ie. no 
getters and don't pass IW to it
* can we assert that _int totalMaxDocs_ is always positive. I know we don't 
allow that many documents in an index but I think it would be good to have an 
extra check.
* can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just 
_ maxMergeAtOnce_

~~ work in progress ~~ I fat-fingered the save button


was (Author: simonw):
here are a couple comments:

* please remove the commented part that refers to // TODO: See LUCENE-8263
* Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_
* can you make  _SegmentSizeAndDocs_ static and maybe a simple struct ie. no 
getters and don't pass IW to it
* can we assert that _int totalMaxDocs_ is always positive. I know we don't 
allow that many documents in an index but I think it would be good to have an 
extra check.
* can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just 
_ maxMergeAtOnce_

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-05-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463704#comment-16463704
 ] 

Simon Willnauer commented on LUCENE-7976:
-

here are a couple comments:

* please remove the commented part that refers to // TODO: See LUCENE-8263
* Can we find a better name for _InfoInfo_ maybe _SegmentSizeAndDocs_
* can you make  _SegmentSizeAndDocs_ static and maybe a simple struct ie. no 
getters and don't pass IW to it
* can we assert that _int totalMaxDocs_ is always positive. I know we don't 
allow that many documents in an index but I think it would be good to have an 
extra check.
* can we name _ maxMergeAtOnceThisMerge_ _ currentMaxMergeAtOnce_ or maybe just 
_ maxMergeAtOnce_

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, 
> LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8293) Ensure only hard deletes are carried over in a merge

2018-05-04 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8293.
-
Resolution: Fixed

> Ensure only hard deletes are carried over in a merge
> 
>
> Key: LUCENE-8293
> URL: https://issues.apache.org/jira/browse/LUCENE-8293
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8293.patch, LUCENE-8293.patch
>
>
> Today we carry over hard deletes based on the SegmentReaders liveDocs.
> This is not correct if soft-deletes are used especially with rentention
> policies. If a soft delete is added while a segment is merged the document
> might end up hard deleted in the target segment. This isn't necessarily a
> correctness issue but causes unnecessary writes of hard-deletes. The 
> biggest
> issue here is that we assert that previously deleted documents are still 
> deleted
> in the live-docs we apply and that might be violated by the retention 
> policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8295) Remove ReadersAndUpdates.liveDocsSharedPending

2018-05-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463532#comment-16463532
 ] 

Simon Willnauer commented on LUCENE-8295:
-


{noformat}
Not that I fully understand it, but looking at the patch alone wouldn't it miss 
calling pendingDeletes.liveDocsShared() (and this in turn does have further 
consequences in that other class)? Ping Simon, he'll know.
{noformat}

I looked at the history and I agree with [~jpountz] that this is unnecessary to 
do outside of the places where we call it explicitly ie in getReaderForMerge 
and getReadOnlyClone. patch LGTM


> Remove ReadersAndUpdates.liveDocsSharedPending
> --
>
> Key: LUCENE-8295
> URL: https://issues.apache.org/jira/browse/LUCENE-8295
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8295.patch
>
>
> I have been trying to undersdand PendingDeletes and ReadersAndUpdates, and it 
> looks to me that the liveDocsSharedPending flag doesn't buy anything. I ran 
> tests 10 times after removing it and got no failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8293) Ensure only hard deletes are carried over in a merge

2018-05-03 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8293:

Attachment: LUCENE-8293.patch

> Ensure only hard deletes are carried over in a merge
> 
>
> Key: LUCENE-8293
> URL: https://issues.apache.org/jira/browse/LUCENE-8293
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8293.patch, LUCENE-8293.patch
>
>
> Today we carry over hard deletes based on the SegmentReaders liveDocs.
> This is not correct if soft-deletes are used especially with rentention
> policies. If a soft delete is added while a segment is merged the document
> might end up hard deleted in the target segment. This isn't necessarily a
> correctness issue but causes unnecessary writes of hard-deletes. The 
> biggest
> issue here is that we assert that previously deleted documents are still 
> deleted
> in the live-docs we apply and that might be violated by the retention 
> policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8293) Ensure only hard deletes are carried over in a merge

2018-05-03 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462609#comment-16462609
 ] 

Simon Willnauer commented on LUCENE-8293:
-

[~mikemccand] I added another test and fixed some corner cases with 
soft-deletes. Can you take another look?

> Ensure only hard deletes are carried over in a merge
> 
>
> Key: LUCENE-8293
> URL: https://issues.apache.org/jira/browse/LUCENE-8293
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8293.patch, LUCENE-8293.patch
>
>
> Today we carry over hard deletes based on the SegmentReaders liveDocs.
> This is not correct if soft-deletes are used especially with rentention
> policies. If a soft delete is added while a segment is merged the document
> might end up hard deleted in the target segment. This isn't necessarily a
> correctness issue but causes unnecessary writes of hard-deletes. The 
> biggest
> issue here is that we assert that previously deleted documents are still 
> deleted
> in the live-docs we apply and that might be violated by the retention 
> policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8293) Ensure only hard deletes are carried over in a merge

2018-05-03 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462527#comment-16462527
 ] 

Simon Willnauer commented on LUCENE-8293:
-

[~erickerickson] no it doesn't

> Ensure only hard deletes are carried over in a merge
> 
>
> Key: LUCENE-8293
> URL: https://issues.apache.org/jira/browse/LUCENE-8293
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8293.patch
>
>
> Today we carry over hard deletes based on the SegmentReaders liveDocs.
> This is not correct if soft-deletes are used especially with rentention
> policies. If a soft delete is added while a segment is merged the document
> might end up hard deleted in the target segment. This isn't necessarily a
> correctness issue but causes unnecessary writes of hard-deletes. The 
> biggest
> issue here is that we assert that previously deleted documents are still 
> deleted
> in the live-docs we apply and that might be violated by the retention 
> policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8290) Keep soft deletes in sync with on-disk DocValues

2018-05-03 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8290.
-
Resolution: Fixed

> Keep soft deletes in sync with on-disk DocValues
> 
>
> Key: LUCENE-8290
> URL: https://issues.apache.org/jira/browse/LUCENE-8290
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8290.patch
>
>
> Today we pass on the doc values update to the PendingDeletes
> when it's applied. This might cause issues with a rentention policy
> merge policy that will see a deleted document but not it's value on
> disk.
> This change moves back the PendingDeletes callback to flush time
> in order to be consistent with what is actually updated on disk.
> 
> This change also makes sure we write values to disk on flush that
> are in the reader pool as well as extra best effort checks to drop
> fully deleted segments on flush, commit and getReader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8293) Ensure only hard deletes are carried over in a merge

2018-05-03 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8293:

Attachment: LUCENE-8293.patch

> Ensure only hard deletes are carried over in a merge
> 
>
> Key: LUCENE-8293
> URL: https://issues.apache.org/jira/browse/LUCENE-8293
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8293.patch
>
>
> Today we carry over hard deletes based on the SegmentReaders liveDocs.
> This is not correct if soft-deletes are used especially with rentention
> policies. If a soft delete is added while a segment is merged the document
> might end up hard deleted in the target segment. This isn't necessarily a
> correctness issue but causes unnecessary writes of hard-deletes. The 
> biggest
> issue here is that we assert that previously deleted documents are still 
> deleted
> in the live-docs we apply and that might be violated by the retention 
> policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8293) Ensure only hard deletes are carried over in a merge

2018-05-03 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8293:
---

 Summary: Ensure only hard deletes are carried over in a merge
 Key: LUCENE-8293
 URL: https://issues.apache.org/jira/browse/LUCENE-8293
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)
 Attachments: LUCENE-8293.patch

Today we carry over hard deletes based on the SegmentReaders liveDocs.
This is not correct if soft-deletes are used especially with rentention
policies. If a soft delete is added while a segment is merged the document
might end up hard deleted in the target segment. This isn't necessarily a
correctness issue but causes unnecessary writes of hard-deletes. The biggest
issue here is that we assert that previously deleted documents are still 
deleted
in the live-docs we apply and that might be violated by the retention 
policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8289) Share logic between Numeric and Binary DocValuesFieldUpdates

2018-05-02 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8289.
-
Resolution: Fixed

> Share logic between Numeric and Binary DocValuesFieldUpdates
> 
>
> Key: LUCENE-8289
> URL: https://issues.apache.org/jira/browse/LUCENE-8289
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8289.patch
>
>
>  NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate
> a significant amount of logic that can all be pushed into the base class.
> This change moves all the logic that is independent of the type to the 
> base
> class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8290) Keep soft deletes in sync with on-disk DocValues

2018-05-02 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461106#comment-16461106
 ] 

Simon Willnauer commented on LUCENE-8290:
-

[~mikemccand] can you take a look

> Keep soft deletes in sync with on-disk DocValues
> 
>
> Key: LUCENE-8290
> URL: https://issues.apache.org/jira/browse/LUCENE-8290
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8290.patch
>
>
> Today we pass on the doc values update to the PendingDeletes
> when it's applied. This might cause issues with a rentention policy
> merge policy that will see a deleted document but not it's value on
> disk.
> This change moves back the PendingDeletes callback to flush time
> in order to be consistent with what is actually updated on disk.
> 
> This change also makes sure we write values to disk on flush that
> are in the reader pool as well as extra best effort checks to drop
> fully deleted segments on flush, commit and getReader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8290) Keep soft deletes in sync with on-disk DocValues

2018-05-02 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8290:

Attachment: LUCENE-8290.patch

> Keep soft deletes in sync with on-disk DocValues
> 
>
> Key: LUCENE-8290
> URL: https://issues.apache.org/jira/browse/LUCENE-8290
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8290.patch
>
>
> Today we pass on the doc values update to the PendingDeletes
> when it's applied. This might cause issues with a rentention policy
> merge policy that will see a deleted document but not it's value on
> disk.
> This change moves back the PendingDeletes callback to flush time
> in order to be consistent with what is actually updated on disk.
> 
> This change also makes sure we write values to disk on flush that
> are in the reader pool as well as extra best effort checks to drop
> fully deleted segments on flush, commit and getReader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8290) Keep soft deletes in sync with on-disk DocValues

2018-05-02 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8290:
---

 Summary: Keep soft deletes in sync with on-disk DocValues
 Key: LUCENE-8290
 URL: https://issues.apache.org/jira/browse/LUCENE-8290
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)


Today we pass on the doc values update to the PendingDeletes
when it's applied. This might cause issues with a rentention policy
merge policy that will see a deleted document but not it's value on
disk.
This change moves back the PendingDeletes callback to flush time
in order to be consistent with what is actually updated on disk.

This change also makes sure we write values to disk on flush that
are in the reader pool as well as extra best effort checks to drop
fully deleted segments on flush, commit and getReader.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8289) Share logic between Numeric and Binary DocValuesFieldUpdates

2018-05-02 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8289:
---

 Summary: Share logic between Numeric and Binary 
DocValuesFieldUpdates
 Key: LUCENE-8289
 URL: https://issues.apache.org/jira/browse/LUCENE-8289
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)
 Attachments: LUCENE-8289.patch

 NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate
a significant amount of logic that can all be pushed into the base class.
This change moves all the logic that is independent of the type to the base
class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8289) Share logic between Numeric and Binary DocValuesFieldUpdates

2018-05-02 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8289:

Attachment: LUCENE-8289.patch

> Share logic between Numeric and Binary DocValuesFieldUpdates
> 
>
> Key: LUCENE-8289
> URL: https://issues.apache.org/jira/browse/LUCENE-8289
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8289.patch
>
>
>  NumericDocValuesFieldUpdates and BinaryDocValuesFieldUpdates duplicate
> a significant amount of logic that can all be pushed into the base class.
> This change moves all the logic that is independent of the type to the 
> base
> class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-master-Windows (64bit/jdk1.8.0_144) - Build # 7296 - Still Unstable!

2018-04-30 Thread Simon Willnauer
pushed a fix to master and 7x

On Mon, Apr 30, 2018 at 11:46 AM, Simon Willnauer
<simon.willna...@gmail.com> wrote:
> I am looking into the TestDirectoryTaxonomyWriter#testRecreateAndRefresh 
> failure
>
> On Mon, Apr 30, 2018 at 9:59 AM, Policeman Jenkins Server
> <jenk...@thetaphi.de> wrote:
>> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7296/
>> Java: 64bit/jdk1.8.0_144 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC
>>
>> 29 tests failed.
>> FAILED:  
>> org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testRecreateAndRefresh
>>
>> Error Message:
>> Directory 
>> MockDirectoryWrapper(NIOFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\build\facet\test\J1\temp\lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter_F392C2FFDA61922B-001\index-NIOFSDirectory-001
>>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5e6a7788) still has 
>> pending deleted files; cannot initialize IndexWriter
>>
>> Stack Trace:
>> java.lang.IllegalArgumentException: Directory 
>> MockDirectoryWrapper(NIOFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\build\facet\test\J1\temp\lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter_F392C2FFDA61922B-001\index-NIOFSDirectory-001
>>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5e6a7788) still has 
>> pending deleted files; cannot initialize IndexWriter
>> at 
>> __randomizedtesting.SeedInfo.seed([F392C2FFDA61922B:BD08DA5FF63FC513]:0)
>> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:707)
>> at 
>> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.openIndexWriter(DirectoryTaxonomyWriter.java:240)
>> at 
>> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.(DirectoryTaxonomyWriter.java:167)
>> at 
>> org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testRecreateAndRefresh(TestDirectoryTaxonomyWriter.java:214)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
>> at 
>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>> at 
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>> at 
>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>> at 
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>> at 
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>> at 
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
>> at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
>> at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
>> at 
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>> at 
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at 
>> org.apache.lucene.util.TestRuleStor

[jira] [Resolved] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates

2018-04-30 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8282.
-
Resolution: Fixed

> Reduce boxing and unnecessary object creation in DV updates
> ---
>
> Key: LUCENE-8282
> URL: https://issues.apache.org/jira/browse/LUCENE-8282
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8282.patch
>
>
> DV updates used the boxed type Long to keep API generic. Yet, the missing
> type caused a lot of code duplication, boxing and unnecessary object creation.
> This change cuts over to type safe APIs using BytesRef and long (the 
> primitive)
> In this change most of the code that is almost identical between binary and 
> numeric
> is not shared reducing the maintenance overhead and likelihood of introducing 
> bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-master-Windows (64bit/jdk1.8.0_144) - Build # 7296 - Still Unstable!

2018-04-30 Thread Simon Willnauer
I am looking into the TestDirectoryTaxonomyWriter#testRecreateAndRefresh failure

On Mon, Apr 30, 2018 at 9:59 AM, Policeman Jenkins Server
 wrote:
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7296/
> Java: 64bit/jdk1.8.0_144 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC
>
> 29 tests failed.
> FAILED:  
> org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testRecreateAndRefresh
>
> Error Message:
> Directory 
> MockDirectoryWrapper(NIOFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\build\facet\test\J1\temp\lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter_F392C2FFDA61922B-001\index-NIOFSDirectory-001
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5e6a7788) still has 
> pending deleted files; cannot initialize IndexWriter
>
> Stack Trace:
> java.lang.IllegalArgumentException: Directory 
> MockDirectoryWrapper(NIOFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\lucene\build\facet\test\J1\temp\lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter_F392C2FFDA61922B-001\index-NIOFSDirectory-001
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5e6a7788) still has 
> pending deleted files; cannot initialize IndexWriter
> at 
> __randomizedtesting.SeedInfo.seed([F392C2FFDA61922B:BD08DA5FF63FC513]:0)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:707)
> at 
> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.openIndexWriter(DirectoryTaxonomyWriter.java:240)
> at 
> org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.(DirectoryTaxonomyWriter.java:167)
> at 
> org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter.testRecreateAndRefresh(TestDirectoryTaxonomyWriter.java:214)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> 

[jira] [Commented] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates

2018-04-27 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456093#comment-16456093
 ] 

Simon Willnauer commented on LUCENE-8282:
-

https://github.com/s1monw/lucene-solr/pull/16 /cc [~mikemccand] [~shaie] 
[~dweiss]

> Reduce boxing and unnecessary object creation in DV updates
> ---
>
> Key: LUCENE-8282
> URL: https://issues.apache.org/jira/browse/LUCENE-8282
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8282.patch
>
>
> DV updates used the boxed type Long to keep API generic. Yet, the missing
> type caused a lot of code duplication, boxing and unnecessary object creation.
> This change cuts over to type safe APIs using BytesRef and long (the 
> primitive)
> In this change most of the code that is almost identical between binary and 
> numeric
> is not shared reducing the maintenance overhead and likelihood of introducing 
> bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates

2018-04-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8282:

Fix Version/s: master (8.0)
   7.4

> Reduce boxing and unnecessary object creation in DV updates
> ---
>
> Key: LUCENE-8282
> URL: https://issues.apache.org/jira/browse/LUCENE-8282
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8282.patch
>
>
> DV updates used the boxed type Long to keep API generic. Yet, the missing
> type caused a lot of code duplication, boxing and unnecessary object creation.
> This change cuts over to type safe APIs using BytesRef and long (the 
> primitive)
> In this change most of the code that is almost identical between binary and 
> numeric
> is not shared reducing the maintenance overhead and likelihood of introducing 
> bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates

2018-04-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8282:

Affects Version/s: master (8.0)
   7.4

> Reduce boxing and unnecessary object creation in DV updates
> ---
>
> Key: LUCENE-8282
> URL: https://issues.apache.org/jira/browse/LUCENE-8282
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8282.patch
>
>
> DV updates used the boxed type Long to keep API generic. Yet, the missing
> type caused a lot of code duplication, boxing and unnecessary object creation.
> This change cuts over to type safe APIs using BytesRef and long (the 
> primitive)
> In this change most of the code that is almost identical between binary and 
> numeric
> is not shared reducing the maintenance overhead and likelihood of introducing 
> bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates

2018-04-27 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8282:

Attachment: LUCENE-8282.patch

> Reduce boxing and unnecessary object creation in DV updates
> ---
>
> Key: LUCENE-8282
> URL: https://issues.apache.org/jira/browse/LUCENE-8282
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8282.patch
>
>
> DV updates used the boxed type Long to keep API generic. Yet, the missing
> type caused a lot of code duplication, boxing and unnecessary object creation.
> This change cuts over to type safe APIs using BytesRef and long (the 
> primitive)
> In this change most of the code that is almost identical between binary and 
> numeric
> is not shared reducing the maintenance overhead and likelihood of introducing 
> bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8282) Reduce boxing and unnecessary object creation in DV updates

2018-04-27 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8282:
---

 Summary: Reduce boxing and unnecessary object creation in DV 
updates
 Key: LUCENE-8282
 URL: https://issues.apache.org/jira/browse/LUCENE-8282
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


DV updates used the boxed type Long to keep API generic. Yet, the missing
type caused a lot of code duplication, boxing and unnecessary object creation.
This change cuts over to type safe APIs using BytesRef and long (the primitive)

In this change most of the code that is almost identical between binary and 
numeric
is not shared reducing the maintenance overhead and likelihood of introducing 
bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Friendly reminder: please run precommit

2018-04-26 Thread Simon Willnauer
precommit:
BUILD SUCCESSFUL

Total time: 10 minutes 48 seconds

on a my like 2 year old macbook pro

I think that is reasonable?

On Thu, Apr 26, 2018 at 3:23 PM, Karl Wright <daddy...@gmail.com> wrote:
> :-)
>
> 25 minutes is an eternity these days, Robert.  This is especially true when
> others are collaborating with what you are doing, as was the case here.  The
> other approach would be to create a branch, but I've been avoiding that on
> git.
>
> "ant documentation-lint" is what I'm looking for, thanks.
>
> Karl
>
>
> On Thu, Apr 26, 2018 at 8:21 AM, Robert Muir <rcm...@gmail.com> wrote:
>>
>> I don't understand the turnaround issue, why do the commits need to be
>> rushed in?
>> There is patch validation recently hooked in to avoid keeping your
>> computer busy for 25 minutes.
>> If you are not changing third party dependencies or anything "heavy"
>> like that you should at least run "ant documentation-lint" from
>> lucene/
>>
>>
>> On Thu, Apr 26, 2018 at 8:02 AM, Karl Wright <daddy...@gmail.com> wrote:
>> > How long does precommit take you to run?  For me, it's a good 25
>> > minutes.
>> > That really impacts turnaround, which is why I'd love a precommit that
>> > looked only at certain things in the local package I'm dealing with.
>> >
>> > Karl
>> >
>> > On Thu, Apr 26, 2018 at 6:14 AM, Simon Willnauer
>> > <simon.willna...@gmail.com>
>> > wrote:
>> >>
>> >> Hey folks,
>> >>
>> >> I had to fix several glitches lately that are caught by running
>> >> precommit. It's a simple step please take the time running `ant clean
>> >> precommit` on top-level.
>> >>
>> >> Thanks,
>> >>
>> >> Simon
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk1.8.0_162) - Build # 21910 - Failure!

2018-04-26 Thread Simon Willnauer
this is fixed

On Thu, Apr 26, 2018 at 12:33 PM, Policeman Jenkins Server
 wrote:
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/21910/
> Java: 64bit/jdk1.8.0_162 -XX:+UseCompressedOops -XX:+UseSerialGC
>
> All tests passed
>
> Build Log:
> [...truncated 53959 lines...]
> -ecj-javadoc-lint-src:
> [mkdir] Created dir: /tmp/ecj1818329761
>  [ecj-lint] Compiling 103 source files to /tmp/ecj1818329761
>  [ecj-lint] --
>  [ecj-lint] 1. ERROR in 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/GeoComplexPolygon.java
>  (at line 22)
>  [ecj-lint] import java.util.Set;
>  [ecj-lint]^
>  [ecj-lint] The import java.util.Set is never used
>  [ecj-lint] --
>  [ecj-lint] 2. ERROR in 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/GeoComplexPolygon.java
>  (at line 23)
>  [ecj-lint] import java.util.HashSet;
>  [ecj-lint]^
>  [ecj-lint] The import java.util.HashSet is never used
>  [ecj-lint] --
>  [ecj-lint] 2 problems (2 errors)
>
> BUILD FAILED
> /home/jenkins/workspace/Lucene-Solr-master-Linux/build.xml:633: The following 
> error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-master-Linux/build.xml:101: The following 
> error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build.xml:208: The 
> following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/common-build.xml:2264:
>  The following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/common-build.xml:2089:
>  The following error occurred while executing this line:
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/common-build.xml:2128:
>  Compile failed; see the compiler error output for details.
>
> Total time: 76 minutes 50 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Setting 
> ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> [WARNINGS] Skipping publisher since build result is FAILURE
> Recording test results
> Setting 
> ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
> Setting 
> ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting 
> ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting 
> ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
> Setting 
> ANT_1_8_2_HOME=/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Friendly reminder: please run precommit

2018-04-26 Thread Simon Willnauer
Hey folks,

I had to fix several glitches lately that are caught by running
precommit. It's a simple step please take the time running `ant clean
precommit` on top-level.

Thanks,

Simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-7.x-Windows (64bit/jdk-9.0.4) - Build # 565 - Still Unstable!

2018-04-26 Thread Simon Willnauer
pushed a fix for this

On Wed, Apr 25, 2018 at 11:24 PM, Policeman Jenkins Server
 wrote:
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Windows/565/
> Java: 64bit/jdk-9.0.4 -XX:-UseCompressedOops -XX:+UseParallelGC
>
> 42 tests failed.
> FAILED:  org.apache.lucene.store.TestNativeFSLockFactory.testStressLocks
>
> Error Message:
> IndexWriter hit unexpected exceptions
>
> Stack Trace:
> java.lang.AssertionError: IndexWriter hit unexpected exceptions
> at 
> __randomizedtesting.SeedInfo.seed([72F563BD958B028E:2CC42D408927CAE8]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.lucene.store.BaseLockFactoryTestCase.testStressLocks(BaseLockFactoryTestCase.java:180)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at java.base/java.lang.Thread.run(Thread.java:844)
>
>
> FAILED:  org.apache.lucene.store.TestNativeFSLockFactory.testStressLocks
>
> Error Message:
> IndexWriter hit unexpected exceptions
>
> Stack Trace:
> java.lang.AssertionError: IndexWriter hit unexpected exceptions
> at 
> __randomizedtesting.SeedInfo.seed([72F563BD958B028E:2CC42D408927CAE8]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at 

Re: [JENKINS] Lucene-Solr-7.x-Windows (64bit/jdk1.8.0_144) - Build # 566 - Still Unstable!

2018-04-26 Thread Simon Willnauer
pushed a fix for this

On Thu, Apr 26, 2018 at 10:48 AM, Policeman Jenkins Server
 wrote:
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Windows/566/
> Java: 64bit/jdk1.8.0_144 -XX:-UseCompressedOops -XX:+UseG1GC
>
> 28 tests failed.
> FAILED:  org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test
>
> Error Message:
> Directory 
> MockDirectoryWrapper(SimpleFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-7.x-Windows\lucene\build\core\test\J1\temp\lucene.index.TestIndexWriterOutOfFileDescriptors_ABAD3B75FD1956FA-002\TestIndexWriterOutOfFileDescriptors-001
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d28f2c0) still has 
> pending deleted files; cannot initialize IndexWriter
>
> Stack Trace:
> java.lang.IllegalArgumentException: Directory 
> MockDirectoryWrapper(SimpleFSDirectory@C:\Users\jenkins\workspace\Lucene-Solr-7.x-Windows\lucene\build\core\test\J1\temp\lucene.index.TestIndexWriterOutOfFileDescriptors_ABAD3B75FD1956FA-002\TestIndexWriterOutOfFileDescriptors-001
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@3d28f2c0) still has 
> pending deleted files; cannot initialize IndexWriter
> at 
> __randomizedtesting.SeedInfo.seed([ABAD3B75FD1956FA:23F904AF53E53B02]:0)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:707)
> at 
> org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test(TestIndexWriterOutOfFileDescriptors.java:68)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> 

Re: [JENKINS] Lucene-Solr-Tests-7.x - Build # 584 - Failure

2018-04-26 Thread Simon Willnauer
I pushed a fix for this as well

On Thu, Apr 26, 2018 at 11:55 AM, Apache Jenkins Server
 wrote:
> Build: https://builds.apache.org/job/Lucene-Solr-Tests-7.x/584/
>
> All tests passed
>
> Build Log:
> [...truncated 54019 lines...]
> -ecj-javadoc-lint-src:
> [mkdir] Created dir: /tmp/ecj270125761
>  [ecj-lint] Compiling 103 source files to /tmp/ecj270125761
>  [ecj-lint] --
>  [ecj-lint] 1. ERROR in 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/GeoComplexPolygon.java
>  (at line 22)
>  [ecj-lint] import java.util.Set;
>  [ecj-lint]^
>  [ecj-lint] The import java.util.Set is never used
>  [ecj-lint] --
>  [ecj-lint] 2. ERROR in 
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/spatial3d/src/java/org/apache/lucene/spatial3d/geom/GeoComplexPolygon.java
>  (at line 23)
>  [ecj-lint] import java.util.HashSet;
>  [ecj-lint]^
>  [ecj-lint] The import java.util.HashSet is never used
>  [ecj-lint] --
>  [ecj-lint] 2 problems (2 errors)
>
> BUILD FAILED
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/build.xml:633: The 
> following error occurred while executing this line:
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/build.xml:101: The 
> following error occurred while executing this line:
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/build.xml:208:
>  The following error occurred while executing this line:
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/common-build.xml:2264:
>  The following error occurred while executing this line:
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/common-build.xml:2089:
>  The following error occurred while executing this line:
> /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-7.x/lucene/common-build.xml:2128:
>  Compile failed; see the compiler error output for details.
>
> Total time: 85 minutes 5 seconds
> Build step 'Invoke Ant' marked build as failure
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-master-Windows (64bit/jdk-9.0.4) - Build # 7289 - Still Unstable!

2018-04-26 Thread Simon Willnauer
I will push a fix for this soon!

On Thu, Apr 26, 2018 at 8:48 AM, Policeman Jenkins Server
 wrote:
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7289/
> Java: 64bit/jdk-9.0.4 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC
>
> 57 tests failed.
> FAILED:  org.apache.lucene.store.TestSleepingLockWrapper.testStressLocks
>
> Error Message:
> IndexWriter hit unexpected exceptions
>
> Stack Trace:
> java.lang.AssertionError: IndexWriter hit unexpected exceptions
> at 
> __randomizedtesting.SeedInfo.seed([A16D532BE1A04815:FF5C1DD6FD0C8073]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.lucene.store.BaseLockFactoryTestCase.testStressLocks(BaseLockFactoryTestCase.java:180)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at java.base/java.lang.Thread.run(Thread.java:844)
>
>
> FAILED:  org.apache.lucene.store.TestSleepingLockWrapper.testStressLocks
>
> Error Message:
> IndexWriter hit unexpected exceptions
>
> Stack Trace:
> java.lang.AssertionError: IndexWriter hit unexpected exceptions
> at 
> __randomizedtesting.SeedInfo.seed([A16D532BE1A04815:FF5C1DD6FD0C8073]:0)
> at org.junit.Assert.fail(Assert.java:93)
> 

[jira] [Commented] (LUCENE-8277) Better validate CodecReaders in addIndexes

2018-04-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452995#comment-16452995
 ] 

Simon Willnauer commented on LUCENE-8277:
-

+1

> Better validate CodecReaders in addIndexes
> --
>
> Key: LUCENE-8277
> URL: https://issues.apache.org/jira/browse/LUCENE-8277
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> The discussion at LUCENE-8264 made me wonder that we should apply the same 
> checks to addIndexes(CodecReader)  that we apply at index time if the input 
> reader is not a SegmentReader such as:
>  - positions are less than the maximum position
>  - offsets are going forward
> And maybe also check that the API is implemented correctly, eg. terms, doc 
> ids and positions are returned in order?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8275) Push up #checkPendingDeletes to Directory

2018-04-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452974#comment-16452974
 ] 

Simon Willnauer commented on LUCENE-8275:
-

{quote}
Curious, what distinction is there between Directory.checkPendingDeletes 
returning true vs it throwing an IOException?  Maybe it should be just one or 
the other – i.e. return boolean but never throw an exception, or return void 
but possibly throw an IOException?
{quote}

The return type of a method doesn't have anything to do with the exception it 
can throw. We do, as a side-effect of this method retry deleteing pending 
deletes, this can throw an IOException. Unless there are some underlying FS 
issues it just signals if there are any pending deletions. I think the 
signature makes sense as it is? 

> Push up #checkPendingDeletes to Directory
> -
>
> Key: LUCENE-8275
> URL: https://issues.apache.org/jira/browse/LUCENE-8275
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8275.patch, LUCENE-8275.patch
>
>
>  IndexWriter checks in it's ctor if the incoming directory is an
> FSDirectory. If that is the case it ensures that the directory retries
> deleting it's pending deletes and if there are pending deletes it will
> fail creating the writer. Yet, this check didn't unwrap filter directories
> or subclasses like FileSwitchDirectory such that in the case of MDW we
> never checked for pending deletes.
> 
> There are also two places in FSDirectory that first removed the file
> that was supposed to be created / renamed to from the pending deletes set
> and then tried to clean up pending deletes which excluded the file. These
> places now remove the file from the set after the pending deletes are 
> checked.
>  
> This caused some test failures lately unfortunately very timing dependent:
>  
> {noformat}
> FAILED:  
> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
> Error Message:
> Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Caused by: java.lang.RuntimeException: 
> java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
> at 
> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
> Caused by: java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
> at 
> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
> at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:2

[jira] [Resolved] (LUCENE-8275) Push up #checkPendingDeletes to Directory

2018-04-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8275.
-
Resolution: Fixed

> Push up #checkPendingDeletes to Directory
> -
>
> Key: LUCENE-8275
> URL: https://issues.apache.org/jira/browse/LUCENE-8275
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8275.patch, LUCENE-8275.patch
>
>
>  IndexWriter checks in it's ctor if the incoming directory is an
> FSDirectory. If that is the case it ensures that the directory retries
> deleting it's pending deletes and if there are pending deletes it will
> fail creating the writer. Yet, this check didn't unwrap filter directories
> or subclasses like FileSwitchDirectory such that in the case of MDW we
> never checked for pending deletes.
> 
> There are also two places in FSDirectory that first removed the file
> that was supposed to be created / renamed to from the pending deletes set
> and then tried to clean up pending deletes which excluded the file. These
> places now remove the file from the set after the pending deletes are 
> checked.
>  
> This caused some test failures lately unfortunately very timing dependent:
>  
> {noformat}
> FAILED:  
> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
> Error Message:
> Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Caused by: java.lang.RuntimeException: 
> java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
> at 
> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
> Caused by: java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
> at 
> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
> at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
> at 
> org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.jav

[jira] [Updated] (LUCENE-8275) Push up #checkPendingDeletes to Directory

2018-04-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8275:

Summary: Push up #checkPendingDeletes to Directory  (was: Unwrap directory 
to check for FSDirectory)

> Push up #checkPendingDeletes to Directory
> -
>
> Key: LUCENE-8275
> URL: https://issues.apache.org/jira/browse/LUCENE-8275
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8275.patch, LUCENE-8275.patch
>
>
> IndexWriter checks in it's ctor if the incoming directory is an
>  FSDirectory. If that is the case it ensures that the directory retries
>  deleting it's pending deletes and if there are pending deletes it will
>  fail creating the writer. Yet, this check didn't unwrap filter directories
>  such that in the case of MDW we never checked for pending deletes.
> There are also two places in FSDirectory that first removed the file
>  that was supposed to be created / renamed to from the pending deletes set
>  and then tried to clean up pending deletes which excluded the file. These
>  places now remove the file from the set after the pending deletes are 
> checked.
>  
> This caused some test failures lately unfortunately very timing dependent:
>  
> {noformat}
> FAILED:  
> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
> Error Message:
> Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Caused by: java.lang.RuntimeException: 
> java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
> at 
> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
> Caused by: java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
> at 
> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
> at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
> at 
> org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)

[jira] [Updated] (LUCENE-8275) Push up #checkPendingDeletes to Directory

2018-04-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8275:

Description: 
 IndexWriter checks in it's ctor if the incoming directory is an
FSDirectory. If that is the case it ensures that the directory retries
deleting it's pending deletes and if there are pending deletes it will
fail creating the writer. Yet, this check didn't unwrap filter directories
or subclasses like FileSwitchDirectory such that in the case of MDW we
never checked for pending deletes.

There are also two places in FSDirectory that first removed the file
that was supposed to be created / renamed to from the pending deletes set
and then tried to clean up pending deletes which excluded the file. These
places now remove the file from the set after the pending deletes are 
checked.

 

This caused some test failures lately unfortunately very timing dependent:

 
{noformat}
FAILED:  junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager

Error Message:
Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
state=RUNNABLE, group=TGRP-TestSearcherManager]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=1567, name=Thread-1363, state=RUNNABLE, 
group=TGRP-TestSearcherManager]
Caused by: java.lang.RuntimeException: 
java.nio.file.FileAlreadyExistsException: 
/home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
at 
org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
Caused by: java.nio.file.FileAlreadyExistsException: 
/home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at 
java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
at 
java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
at 
org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
at 
org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
at 
org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
at 
org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
at 
org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
at 
org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
at 
org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
at 
org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
at 
org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
at 
org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:48)
at 
org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:39)
at 
org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:46)
at 
org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:363)
at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:399)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:490

[jira] [Updated] (LUCENE-8275) Unwrap directory to check for FSDirectory

2018-04-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8275:

Attachment: LUCENE-8275.patch

> Unwrap directory to check for FSDirectory
> -
>
> Key: LUCENE-8275
> URL: https://issues.apache.org/jira/browse/LUCENE-8275
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8275.patch, LUCENE-8275.patch
>
>
> IndexWriter checks in it's ctor if the incoming directory is an
>  FSDirectory. If that is the case it ensures that the directory retries
>  deleting it's pending deletes and if there are pending deletes it will
>  fail creating the writer. Yet, this check didn't unwrap filter directories
>  such that in the case of MDW we never checked for pending deletes.
> There are also two places in FSDirectory that first removed the file
>  that was supposed to be created / renamed to from the pending deletes set
>  and then tried to clean up pending deletes which excluded the file. These
>  places now remove the file from the set after the pending deletes are 
> checked.
>  
> This caused some test failures lately unfortunately very timing dependent:
>  
> {noformat}
> FAILED:  
> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
> Error Message:
> Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Caused by: java.lang.RuntimeException: 
> java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
> at 
> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
> Caused by: java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
> at 
> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
> at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
> at 
> org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
> at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.f

[jira] [Resolved] (LUCENE-8272) Share internal DV update code between binary and numeric

2018-04-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8272.
-
Resolution: Fixed

thanks everybody

> Share internal DV update code between binary and numeric
> 
>
> Key: LUCENE-8272
> URL: https://issues.apache.org/jira/browse/LUCENE-8272
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8272.patch
>
>
> Today we duplicate a fair portion of the internal logic to
> apply updates of binary and numeric doc values. This change refactors
> this non-trivial code to share the same code path and only differ in
> if we provide a binary or numeric instance. This also allows us to
> iterator over the updates only once rather than twice once for numeric
> and once for binary fields.
> 
> This change also subclass DocValuesIterator from 
> DocValuesFieldUpdates.Iterator
> which allows easier consumption down the road since it now shares most of 
> it's
> interface with DocIdSetIterator which is the main interface for this in 
> Lucene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8275) Unwrap directory to check for FSDirectory

2018-04-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452249#comment-16452249
 ] 

Simon Willnauer commented on LUCENE-8275:
-

[~rcmuir] good point about FSD. I attached a new patch with a minimal solution.

> Unwrap directory to check for FSDirectory
> -
>
> Key: LUCENE-8275
> URL: https://issues.apache.org/jira/browse/LUCENE-8275
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8275.patch
>
>
> IndexWriter checks in it's ctor if the incoming directory is an
>  FSDirectory. If that is the case it ensures that the directory retries
>  deleting it's pending deletes and if there are pending deletes it will
>  fail creating the writer. Yet, this check didn't unwrap filter directories
>  such that in the case of MDW we never checked for pending deletes.
> There are also two places in FSDirectory that first removed the file
>  that was supposed to be created / renamed to from the pending deletes set
>  and then tried to clean up pending deletes which excluded the file. These
>  places now remove the file from the set after the pending deletes are 
> checked.
>  
> This caused some test failures lately unfortunately very timing dependent:
>  
> {noformat}
> FAILED:  
> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
> Error Message:
> Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Caused by: java.lang.RuntimeException: 
> java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
> at 
> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
> Caused by: java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
> at 
> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
> at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
> at 
> org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)

[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments

2018-04-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451952#comment-16451952
 ] 

Simon Willnauer commented on LUCENE-8264:
-

I totally agree with robert here, good collection of valid technical points. We 
can't let lurking corruptions happen. The improvements made to norms here are 
awesome and we need to move forward with stuff like this. Also after looking at 
the details, I am convinced the guarantees that this restriction gives us are 
crucial to the future of lucene. We can't support lurking corruptions for users 
that come from ancient versions by converting (merging / rewriting segments) 
from N-X to N in steps that nobody ever tested.

Also the points about the database aspect are very much valid. We need raw data 
to re-create these indices reliably and if you are running on top of a search 
engine you need to account for reindexing.

Btw. we have this restriction in ES since 1.0 implicitly. We always only 
supported N-1 major versions for ES indices, yet they happen to be 
corresponding to N-1 Lucene major versions. There is also a lot of work gone 
into supporting searching across major versions of ES to allow users to stay on 
older versions for retention policy purposes. Some of these conversations are 
not easy but necessary for us to prevent support insanity. 

That said, I think there might be room for N-X at some point as long as the 
guarantee is only N-1. At some point we might allow the min index created 
version to be 7 even if you are on 9. But for us to make progress we need to be 
free to break and only guarantee N-1. 

Also, what this means is that your indices are supported ~2.5 years that is the 
major release cadence historically. I think it's important to keep this in 
mind.   

> Allow an option to rewrite all segments
> ---
>
> Key: LUCENE-8264
> URL: https://issues.apache.org/jira/browse/LUCENE-8264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during 
> upgrades, if we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're 
> rewritten into the current format. However, there's no guarantee that a 
> particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily 
> be successful.
> How many merge policies support this is an open question. I propose to start 
> with TMP and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's 
> increasingly difficult as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-10) - Build # 21897 - Unstable!

2018-04-25 Thread Simon Willnauer
I was able to reproduce this failure by extending the time this test
runs. (it's depends on clock time which is terrible enough). The issue
doesn't seem to be related to the changes made lately, the only
relation I can see is that due to the changes some timing changed and
I suspect things got a bit quicker cutting over to a new IW. The issue
(afaik) is that there is still a reference to a file open after the IW
got rolled back and WindowsFS can't delete the file causing
FSDirectory to put it into pendingDeletes. Now we try to open a new IW
and it tries to write this file again and the test fails since we
potentially never check again for pending files. There is also this
N^2 protection in FSDirectory that doesn't help here necessarily.  I
opened [1] to fix IW and try to delete pending files when they are
created new. I still think this test can potentially run into this
situation sooner or later.

[1] https://issues.apache.org/jira/browse/LUCENE-8275

On Tue, Apr 24, 2018 at 6:06 PM, Simon Willnauer
<simon.willna...@gmail.com> wrote:
> I am looking into this
>
> On Tue, Apr 24, 2018 at 5:37 PM, Policeman Jenkins Server
> <jenk...@thetaphi.de> wrote:
>> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/21897/
>> Java: 64bit/jdk-10 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
>>
>> 6 tests failed.
>> FAILED:  
>> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
>>
>> Error Message:
>> Suite timeout exceeded (>= 720 msec).
>>
>> Stack Trace:
>> java.lang.Exception: Suite timeout exceeded (>= 720 msec).
>> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
>>
>>
>> FAILED:  
>> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
>>
>> Error Message:
>> Captured an uncaught exception in thread: Thread[id=17, name=Thread-1, 
>> state=RUNNABLE, group=TGRP-TestSearcherManager]
>>
>> Stack Trace:
>> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
>> uncaught exception in thread: Thread[id=17, name=Thread-1, state=RUNNABLE, 
>> group=TGRP-TestSearcherManager]
>> Caused by: java.lang.RuntimeException: 
>> java.nio.file.FileAlreadyExistsException: 
>> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J2/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
>> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
>> at 
>> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
>> Caused by: java.nio.file.FileAlreadyExistsException: 
>> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J2/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
>> at 
>> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
>> at 
>> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>> at 
>> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
>> at 
>> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
>> at 
>> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
>> at 
>> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
>> at 
>> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
>> at 
>> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
>> at 
>> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
>> at 
>> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
>> at 
>> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
>> at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
>> at 
>> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
>> at 
>> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
>> at 
>> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
>> at 
>> org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
>> at 
>> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
>> at 
>> org.apache.lucene.store.TrackingDirectoryWr

[jira] [Created] (LUCENE-8275) Unwrap directory to check for FSDirectory

2018-04-25 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8275:
---

 Summary: Unwrap directory to check for FSDirectory
 Key: LUCENE-8275
 URL: https://issues.apache.org/jira/browse/LUCENE-8275
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)
 Attachments: LUCENE-8275.patch

IndexWriter checks in it's ctor if the incoming directory is an
 FSDirectory. If that is the case it ensures that the directory retries
 deleting it's pending deletes and if there are pending deletes it will
 fail creating the writer. Yet, this check didn't unwrap filter directories
 such that in the case of MDW we never checked for pending deletes.

There are also two places in FSDirectory that first removed the file
 that was supposed to be created / renamed to from the pending deletes set
 and then tried to clean up pending deletes which excluded the file. These
 places now remove the file from the set after the pending deletes are checked.

 

This caused some test failures lately unfortunately very timing dependent:

 
{noformat}
FAILED:  junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager

Error Message:
Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
state=RUNNABLE, group=TGRP-TestSearcherManager]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=1567, name=Thread-1363, state=RUNNABLE, 
group=TGRP-TestSearcherManager]
Caused by: java.lang.RuntimeException: 
java.nio.file.FileAlreadyExistsException: 
/home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
at 
org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
Caused by: java.nio.file.FileAlreadyExistsException: 
/home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at 
java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
at 
java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
at 
org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
at 
org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
at 
org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
at 
org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
at 
org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
at 
org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
at 
org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
at 
org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
at 
org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
at 
org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:48)
at 
org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:39)
at 
org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:46)
at 
org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:363)
at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:399

[jira] [Updated] (LUCENE-8275) Unwrap directory to check for FSDirectory

2018-04-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8275:

Attachment: LUCENE-8275.patch

> Unwrap directory to check for FSDirectory
> -
>
> Key: LUCENE-8275
> URL: https://issues.apache.org/jira/browse/LUCENE-8275
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8275.patch
>
>
> IndexWriter checks in it's ctor if the incoming directory is an
>  FSDirectory. If that is the case it ensures that the directory retries
>  deleting it's pending deletes and if there are pending deletes it will
>  fail creating the writer. Yet, this check didn't unwrap filter directories
>  such that in the case of MDW we never checked for pending deletes.
> There are also two places in FSDirectory that first removed the file
>  that was supposed to be created / renamed to from the pending deletes set
>  and then tried to clean up pending deletes which excluded the file. These
>  places now remove the file from the set after the pending deletes are 
> checked.
>  
> This caused some test failures lately unfortunately very timing dependent:
>  
> {noformat}
> FAILED:  
> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
> Error Message:
> Captured an uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=1567, name=Thread-1363, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
> Caused by: java.lang.RuntimeException: 
> java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
> at 
> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
> Caused by: java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J1/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
> at 
> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
> at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
> at 
> org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
> at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.f

Re: [JENKINS] Lucene-Solr-master-Linux (64bit/jdk-10) - Build # 21897 - Unstable!

2018-04-24 Thread Simon Willnauer
I am looking into this

On Tue, Apr 24, 2018 at 5:37 PM, Policeman Jenkins Server
 wrote:
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/21897/
> Java: 64bit/jdk-10 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
>
> 6 tests failed.
> FAILED:  
> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
>
> Error Message:
> Suite timeout exceeded (>= 720 msec).
>
> Stack Trace:
> java.lang.Exception: Suite timeout exceeded (>= 720 msec).
> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
>
>
> FAILED:  
> junit.framework.TestSuite.org.apache.lucene.search.TestSearcherManager
>
> Error Message:
> Captured an uncaught exception in thread: Thread[id=17, name=Thread-1, 
> state=RUNNABLE, group=TGRP-TestSearcherManager]
>
> Stack Trace:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=17, name=Thread-1, state=RUNNABLE, 
> group=TGRP-TestSearcherManager]
> Caused by: java.lang.RuntimeException: 
> java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J2/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at __randomizedtesting.SeedInfo.seed([BA998C838D219DA9]:0)
> at 
> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:590)
> Caused by: java.nio.file.FileAlreadyExistsException: 
> /home/jenkins/workspace/Lucene-Solr-master-Linux/lucene/build/core/test/J2/temp/lucene.search.TestSearcherManager_BA998C838D219DA9-001/tempDir-001/_0.fdt
> at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:94)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at 
> java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215)
> at 
> java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:129)
> at 
> org.apache.lucene.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:197)
> at java.base/java.nio.file.Files.newOutputStream(Files.java:218)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
> at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
> at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
> at 
> org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:116)
> at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
> at 
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
> at 
> org.apache.lucene.codecs.asserting.AssertingStoredFieldsFormat.fieldsWriter(AssertingStoredFieldsFormat.java:48)
> at 
> org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:39)
> at 
> org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:46)
> at 
> org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:363)
> at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:399)
> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:490)
> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1518)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1210)
> at 
> org.apache.lucene.search.TestSearcherManager$8.run(TestSearcherManager.java:574)
>
>
> FAILED:  
> 

[jira] [Resolved] (LUCENE-8271) Remove IndexWriter from DWFlushQueue

2018-04-24 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8271.
-
Resolution: Fixed

>  Remove IndexWriter from DWFlushQueue
> -
>
> Key: LUCENE-8271
> URL: https://issues.apache.org/jira/browse/LUCENE-8271
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8271.patch
>
>
> This simplifies DocumentsWriterFlushQueue by moving all IW related
> code out of it. The DWFQ now only contains logic for taking tickets
> off the queue and applying it to a given consumer. The logic now
> entirely resides in IW and has private visitiliby. Locking
> also is more contained since IW knows exactly what is called and when.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8272) Share internal DV update code between binary and numeric

2018-04-24 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449856#comment-16449856
 ] 

Simon Willnauer commented on LUCENE-8272:
-

[https://github.com/s1monw/lucene-solr/pull/15] /cc [~mikemccand]

> Share internal DV update code between binary and numeric
> 
>
> Key: LUCENE-8272
> URL: https://issues.apache.org/jira/browse/LUCENE-8272
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8272.patch
>
>
> Today we duplicate a fair portion of the internal logic to
> apply updates of binary and numeric doc values. This change refactors
> this non-trivial code to share the same code path and only differ in
> if we provide a binary or numeric instance. This also allows us to
> iterator over the updates only once rather than twice once for numeric
> and once for binary fields.
> 
> This change also subclass DocValuesIterator from 
> DocValuesFieldUpdates.Iterator
> which allows easier consumption down the road since it now shares most of 
> it's
> interface with DocIdSetIterator which is the main interface for this in 
> Lucene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8272) Share internal DV update code between binary and numeric

2018-04-24 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8272:

Attachment: LUCENE-8272.patch

> Share internal DV update code between binary and numeric
> 
>
> Key: LUCENE-8272
> URL: https://issues.apache.org/jira/browse/LUCENE-8272
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8272.patch
>
>
> Today we duplicate a fair portion of the internal logic to
> apply updates of binary and numeric doc values. This change refactors
> this non-trivial code to share the same code path and only differ in
> if we provide a binary or numeric instance. This also allows us to
> iterator over the updates only once rather than twice once for numeric
> and once for binary fields.
> 
> This change also subclass DocValuesIterator from 
> DocValuesFieldUpdates.Iterator
> which allows easier consumption down the road since it now shares most of 
> it's
> interface with DocIdSetIterator which is the main interface for this in 
> Lucene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8272) Share internal DV update code between binary and numeric

2018-04-24 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8272:
---

 Summary: Share internal DV update code between binary and numeric
 Key: LUCENE-8272
 URL: https://issues.apache.org/jira/browse/LUCENE-8272
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)
 Attachments: LUCENE-8272.patch

Today we duplicate a fair portion of the internal logic to
apply updates of binary and numeric doc values. This change refactors
this non-trivial code to share the same code path and only differ in
if we provide a binary or numeric instance. This also allows us to
iterator over the updates only once rather than twice once for numeric
and once for binary fields.

This change also subclass DocValuesIterator from 
DocValuesFieldUpdates.Iterator
which allows easier consumption down the road since it now shares most of 
it's
interface with DocIdSetIterator which is the main interface for this in 
Lucene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments

2018-04-24 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449666#comment-16449666
 ] 

Simon Willnauer commented on LUCENE-8264:
-

to be absolutely honest I was surprised by this as well. I think the reasons 
behind this change make sense to me but the implications are big. I am not sure 
if the strictness here comes only from the broken TermVectors offsets or not 
but if so can we discuss relaxing this a bit. This change hit a couple of 
committers by surprise (including myself) and I wonder if we can take a step 
back and reiterate on this decision? While there are a bunch or other issues 
when you for instance go from 3.x to 7.x like your tokenization / analysis 
chain isn't supported anymore etc. there are valid usecases for ugrading your 
index via background merges rewriting the index format. The issues like 
unsupported analysis chains should be handled by highler level apps like solr 
or es. Like there are tons of people that use lucene as a retrieval engine 
doing very simple whitespace tokenization, a merge from 3.x to 7.x might be 
just fine? I think it would be good to have the conversation again even though 
the changes were communicated very openly. [~jpountz] [~thetaphi] [~rcmuir] 
[~mikemccand] [~dweiss] WDYT?

> Allow an option to rewrite all segments
> ---
>
> Key: LUCENE-8264
> URL: https://issues.apache.org/jira/browse/LUCENE-8264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during 
> upgrades, if we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're 
> rewritten into the current format. However, there's no guarantee that a 
> particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily 
> be successful.
> How many merge policies support this is an open question. I propose to start 
> with TMP and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's 
> increasingly difficult as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8271) Remove IndexWriter from DWFlushQueue

2018-04-24 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449429#comment-16449429
 ] 

Simon Willnauer commented on LUCENE-8271:
-

/cc [~mikemccand] [~dweiss] https://github.com/s1monw/lucene-solr/pull/14

>  Remove IndexWriter from DWFlushQueue
> -
>
> Key: LUCENE-8271
> URL: https://issues.apache.org/jira/browse/LUCENE-8271
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8271.patch
>
>
> This simplifies DocumentsWriterFlushQueue by moving all IW related
> code out of it. The DWFQ now only contains logic for taking tickets
> off the queue and applying it to a given consumer. The logic now
> entirely resides in IW and has private visitiliby. Locking
> also is more contained since IW knows exactly what is called and when.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8271) Remove IndexWriter from DWFlushQueue

2018-04-24 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8271:

Attachment: LUCENE-8271.patch

>  Remove IndexWriter from DWFlushQueue
> -
>
> Key: LUCENE-8271
> URL: https://issues.apache.org/jira/browse/LUCENE-8271
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8271.patch
>
>
> This simplifies DocumentsWriterFlushQueue by moving all IW related
> code out of it. The DWFQ now only contains logic for taking tickets
> off the queue and applying it to a given consumer. The logic now
> entirely resides in IW and has private visitiliby. Locking
> also is more contained since IW knows exactly what is called and when.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8271) Remove IndexWriter from DWFlushQueue

2018-04-24 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8271:
---

 Summary:  Remove IndexWriter from DWFlushQueue
 Key: LUCENE-8271
 URL: https://issues.apache.org/jira/browse/LUCENE-8271
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)


This simplifies DocumentsWriterFlushQueue by moving all IW related
code out of it. The DWFQ now only contains logic for taking tickets
off the queue and applying it to a given consumer. The logic now
entirely resides in IW and has private visitiliby. Locking
also is more contained since IW knows exactly what is called and when.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8269) Detach downstream classes from IndexWriter

2018-04-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8269.
-
Resolution: Fixed

> Detach downstream classes from IndexWriter
> --
>
> Key: LUCENE-8269
> URL: https://issues.apache.org/jira/browse/LUCENE-8269
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8269.patch
>
>
> IndexWriter today is shared with many classes like BufferedUpdateStream,
> DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks
> on the writer instance or assert that the current thread doesn't hold a lock.
> This makes it very difficult to have a manageable threading model.
> 
> This change separates out the IndexWriter from those classes and makes 
> them all
> independent of IW. IW now implements a new interface for DocumentsWriter 
> to communicate
> on failed or successful flushes and tragic events. This allows IW to make 
> it's critical
> methods private and execute all lock critical actions on it's private 
> queue that ensures
> that the IW lock is not held. Follow-up changes will try to detach more 
> code like
> publishing flushed segments to ensure we never call back into IW in an 
> uncontrolled way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448287#comment-16448287
 ] 

Simon Willnauer commented on LUCENE-8267:
-

+1 to what [~rcmuir] said so many more efficient options

{quote}Do you mean to say I should have said all I said without voting first? 
Lets have a conversation! (we _are_ having a conversation){quote}

So I perceive your veto as an aggressive step. To me it's a last resort after 
we can't find a solution that is good for all of us. The conversation already 
has a tone that is not appropriate and could have been prevented by formulating 
objections as questions. like _I am using this postings format in X and it's 
serving well, what are the alternatives._ - I am sure you would have got an 
awesome answer.

{quote}I don't understand this point of view; can you please elaborate? Fear of 
what?{quote}

if you can't remove stuff without others jumping in vetoing the reaction will 
be to prevent additions in the same way due to _fear_  created by the veto. 
This is a terrible place to be in, we have seen this in the past we should 
prevent it.

 

> Remove memory codecs from the codebase
> --
>
> Key: LUCENE-8267
> URL: https://issues.apache.org/jira/browse/LUCENE-8267
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Priority: Major
>
> Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random 
> selection of codecs for tests and cause occasional OOMs when a test with huge 
> data is selected. We don't use those memory codecs anywhere outside of tests, 
> it has been suggested to just remove them to avoid maintenance costs and OOMs 
> in tests. [1]
> [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8267) Remove memory codecs from the codebase

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448208#comment-16448208
 ] 

Simon Willnauer edited comment on LUCENE-8267 at 4/23/18 2:19 PM:
--

{quote}If we are going to make it harder to remove stuff, I have no problem 
being the one to make it equally harder to add stuff.
{quote}

I agree this is one of these issues that we have to face. if we put the bar 
very high to remove stuff that is not mainstream then we will have a super hard 
time adding stuff. It creates fear driven decisions. It sucks I agree with 
[~rcmuir] 100% here.
  
{quote}
-1 sorry. I've used the MemoryPostingsFormat for a text-tagging use-case where 
there are intense lookups against the terms dictionary. It's highly beneficial 
to have the terms dictionary be entirely memory resident, albeit in a compact 
FST. The issue description mentions "We don't use those memory codecs anywhere 
outside of tests" – this should be no surprise as it's not the default codec. 
I'm sure it may be hard to gauge the level of use of something outside of 
core-Lucene. When we ponder removing something that Lucene doesn't even _need_, 
I propose we raise the issue more openly to the community. Perhaps the question 
could be proposed in CHANGES.txt and/or release announcements to solicit 
community input?
{quote}
 
 given that you know that you are using your veto here we are already in a 
terrible position to have any conversation. Can you quantify the "it's nice"? 
since there are alternatives that (standard codec) can you go and provide some 
numbers. We should not use vetos based on non-quantifiable arguments IMO. We 
can go and ask the community but I don't expect much useful outcome, most of 
the folks don't know what they are using here and there. Nevertheless, I am 
happy to send a mail to dev to get this information. 


was (Author: simonw):
{quote}
If we are going to make it harder to remove stuff, I have no problem being the 
one to make it equally harder to add stuff.
 \{quote}
 
I agree this is one of these issues that we have to face. if we put the bar 
very high to remove stuff that is not mainstream then we will have a super hard 
time adding stuff. It creates fear driven decisions. It sucks I agree with 
[~rcmuir] 100% here.
 
{quote}
-1 sorry. I've used the MemoryPostingsFormat for a text-tagging use-case where 
there are intense lookups against the terms dictionary. It's highly beneficial 
to have the terms dictionary be entirely memory resident, albeit in a compact 
FST. The issue description mentions "We don't use those memory codecs anywhere 
outside of tests" – this should be no surprise as it's not the default codec. 
I'm sure it may be hard to gauge the level of use of something outside of 
core-Lucene. When we ponder removing something that Lucene doesn't even _need_, 
I propose we raise the issue more openly to the community. Perhaps the question 
could be proposed in CHANGES.txt and/or release announcements to solicit 
community input?
{quote}
 
given that you know that you are using your veto here we are already in a 
terrible position to have any conversation. Can you quantify the "it's nice"? 
since there are alternatives that (standard codec) can you go and provide some 
numbers. We should not use vetos based on non-quantifiable arguments IMO. We 
can go and ask the community but I don't expect much useful outcome, most of 
the folks don't know what they are using here and there. Nevertheless, I am 
happy to send a mail to dev to get this information. 

> Remove memory codecs from the codebase
> --
>
> Key: LUCENE-8267
> URL: https://issues.apache.org/jira/browse/LUCENE-8267
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Priority: Major
>
> Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random 
> selection of codecs for tests and cause occasional OOMs when a test with huge 
> data is selected. We don't use those memory codecs anywhere outside of tests, 
> it has been suggested to just remove them to avoid maintenance costs and OOMs 
> in tests. [1]
> [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448208#comment-16448208
 ] 

Simon Willnauer commented on LUCENE-8267:
-

{quote}
If we are going to make it harder to remove stuff, I have no problem being the 
one to make it equally harder to add stuff.
 \{quote}
 
I agree this is one of these issues that we have to face. if we put the bar 
very high to remove stuff that is not mainstream then we will have a super hard 
time adding stuff. It creates fear driven decisions. It sucks I agree with 
[~rcmuir] 100% here.
 
{quote}
-1 sorry. I've used the MemoryPostingsFormat for a text-tagging use-case where 
there are intense lookups against the terms dictionary. It's highly beneficial 
to have the terms dictionary be entirely memory resident, albeit in a compact 
FST. The issue description mentions "We don't use those memory codecs anywhere 
outside of tests" – this should be no surprise as it's not the default codec. 
I'm sure it may be hard to gauge the level of use of something outside of 
core-Lucene. When we ponder removing something that Lucene doesn't even _need_, 
I propose we raise the issue more openly to the community. Perhaps the question 
could be proposed in CHANGES.txt and/or release announcements to solicit 
community input?
{quote}
 
given that you know that you are using your veto here we are already in a 
terrible position to have any conversation. Can you quantify the "it's nice"? 
since there are alternatives that (standard codec) can you go and provide some 
numbers. We should not use vetos based on non-quantifiable arguments IMO. We 
can go and ask the community but I don't expect much useful outcome, most of 
the folks don't know what they are using here and there. Nevertheless, I am 
happy to send a mail to dev to get this information. 

> Remove memory codecs from the codebase
> --
>
> Key: LUCENE-8267
> URL: https://issues.apache.org/jira/browse/LUCENE-8267
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Priority: Major
>
> Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random 
> selection of codecs for tests and cause occasional OOMs when a test with huge 
> data is selected. We don't use those memory codecs anywhere outside of tests, 
> it has been suggested to just remove them to avoid maintenance costs and OOMs 
> in tests. [1]
> [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8269) Detach downstream classes from IndexWriter

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448026#comment-16448026
 ] 

Simon Willnauer commented on LUCENE-8269:
-

[https://github.com/s1monw/lucene-solr/pull/13/] /cc [~mikemccand]

> Detach downstream classes from IndexWriter
> --
>
> Key: LUCENE-8269
> URL: https://issues.apache.org/jira/browse/LUCENE-8269
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8269.patch
>
>
> IndexWriter today is shared with many classes like BufferedUpdateStream,
> DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks
> on the writer instance or assert that the current thread doesn't hold a lock.
> This makes it very difficult to have a manageable threading model.
> 
> This change separates out the IndexWriter from those classes and makes 
> them all
> independent of IW. IW now implements a new interface for DocumentsWriter 
> to communicate
> on failed or successful flushes and tragic events. This allows IW to make 
> it's critical
> methods private and execute all lock critical actions on it's private 
> queue that ensures
> that the IW lock is not held. Follow-up changes will try to detach more 
> code like
> publishing flushed segments to ensure we never call back into IW in an 
> uncontrolled way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8269) Detach downstream classes from IndexWriter

2018-04-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8269:

Attachment: LUCENE-8269.patch

> Detach downstream classes from IndexWriter
> --
>
> Key: LUCENE-8269
> URL: https://issues.apache.org/jira/browse/LUCENE-8269
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8269.patch
>
>
> IndexWriter today is shared with many classes like BufferedUpdateStream,
> DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks
> on the writer instance or assert that the current thread doesn't hold a lock.
> This makes it very difficult to have a manageable threading model.
> 
> This change separates out the IndexWriter from those classes and makes 
> them all
> independent of IW. IW now implements a new interface for DocumentsWriter 
> to communicate
> on failed or successful flushes and tragic events. This allows IW to make 
> it's critical
> methods private and execute all lock critical actions on it's private 
> queue that ensures
> that the IW lock is not held. Follow-up changes will try to detach more 
> code like
> publishing flushed segments to ensure we never call back into IW in an 
> uncontrolled way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8269) Detach downstream classes from IndexWriter

2018-04-23 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8269:
---

 Summary: Detach downstream classes from IndexWriter
 Key: LUCENE-8269
 URL: https://issues.apache.org/jira/browse/LUCENE-8269
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)


IndexWriter today is shared with many classes like BufferedUpdateStream,
DocumentsWriter and DocumentsWriterPerThread. Some of them even acquire locks
on the writer instance or assert that the current thread doesn't hold a lock.
This makes it very difficult to have a manageable threading model.

This change separates out the IndexWriter from those classes and makes them 
all
independent of IW. IW now implements a new interface for DocumentsWriter to 
communicate
on failed or successful flushes and tragic events. This allows IW to make 
it's critical
methods private and execute all lock critical actions on it's private queue 
that ensures
that the IW lock is not held. Follow-up changes will try to detach more 
code like
publishing flushed segments to ensure we never call back into IW in an 
uncontrolled way.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447948#comment-16447948
 ] 

Simon Willnauer commented on LUCENE-8268:
-

{quote}
So at the moment there isn't anything that actually uses this. My reason for 
adding it was to make it possible to identify the leaf query that returned each 
position, but maybe it would be a better idea to remove terms() entirely, and 
add a getLeafQuery() method instead?
{quote}

hard to tell since I don't know the API well enough. But if this is the 
purpose, I agree.

> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8267) Remove memory codecs from the codebase

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447787#comment-16447787
 ] 

Simon Willnauer commented on LUCENE-8267:
-

+1

> Remove memory codecs from the codebase
> --
>
> Key: LUCENE-8267
> URL: https://issues.apache.org/jira/browse/LUCENE-8267
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Priority: Major
>
> Memory codecs (MemoryPostings*, MemoryDocValues*) are part of random 
> selection of codecs for tests and cause occasional OOMs when a test with huge 
> data is selected. We don't use those memory codecs anywhere outside of tests, 
> it has been suggested to just remove them to avoid maintenance costs and OOMs 
> in tests. [1]
> [1] https://apache.markmail.org/thread/mj53os2ekyldsoy3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447779#comment-16447779
 ] 

Simon Willnauer commented on LUCENE-8268:
-

a couple of questions:

* in _compareBytesRefArrays_ how can you tell that comparing each individual 
term is correct? 
* is _BytesRefIterator_ an option as a return value and would it make sense. 
It's hart do tell without a single user of this. 
* In the current context there is no gain changing this interface. Can we add a 
users of multiple terms?

> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447767#comment-16447767
 ] 

Simon Willnauer commented on LUCENE-8264:
-

> It worked at least until 7.x. As I said, you can remove offsets if needed. 
> And of course a FilterLeafReader together with SlowCodecReaderWrapper is 
> definitly needed.

I am not so sure about this, at least 
[this|https://github.com/apache/lucene-solr/blob/branch_7x/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2756]
 will fail and it's in there since 7.0



> Allow an option to rewrite all segments
> ---
>
> Key: LUCENE-8264
> URL: https://issues.apache.org/jira/browse/LUCENE-8264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during 
> upgrades, if we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're 
> rewritten into the current format. However, there's no guarantee that a 
> particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily 
> be successful.
> How many merge policies support this is an open question. I propose to start 
> with TMP and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's 
> increasingly difficult as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8260) Extract ReaderPool from IndexWriter

2018-04-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8260.
-
Resolution: Fixed

thanks everyone!

>  Extract ReaderPool from IndexWriter
> 
>
> Key: LUCENE-8260
> URL: https://issues.apache.org/jira/browse/LUCENE-8260
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8260.diff
>
>
> ReaderPool plays a central role in the IndexWriter pooling NRT readers and 
> making sure we write buffered deletes and updates to disk. This class used to 
> be a non-static inner class accessing many aspects including locks from the 
> IndexWriter itself. This change moves the class outside of IW and defines 
> it's responsiblity in a clear way with respect to locks etc. Now IndexWriter 
> doesn't need to share ReaderPool anymore and reacts on writes done inside the 
> pool by checkpointing internally. This also removes acquiring the IW lock 
> inside the reader pool which makes reasoning about concurrency difficult.
> This change also add javadocs and dedicated tests for the ReaderPool class.
> /cc [~mikemccand] [~dawidweiss]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447758#comment-16447758
 ] 

Simon Willnauer commented on LUCENE-8264:
-

[~thetaphi] I don't think this is going to work here. 
IndexWriter#validateMergeReader will prevent you from doing this unless you add 
some evil hacks.

> Allow an option to rewrite all segments
> ---
>
> Key: LUCENE-8264
> URL: https://issues.apache.org/jira/browse/LUCENE-8264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during 
> upgrades, if we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're 
> rewritten into the current format. However, there's no guarantee that a 
> particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily 
> be successful.
> How many merge policies support this is an open question. I propose to start 
> with TMP and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's 
> increasingly difficult as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8264) Allow an option to rewrite all segments

2018-04-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447707#comment-16447707
 ] 

Simon Willnauer commented on LUCENE-8264:
-

[~dweiss] I think you are not aware of the fact that an index that was created 
with N-2 won't be supported by N even if you rewrite all segments. The created 
version is baked into the segments file and Lucene will not open it even if all 
segments are on N or N-1. There are several reasons for this for instance to 
reject broken offsets in term vectors in Lucene 7. We can never enforce limits 
like this if we keep on upgrading stuff behind the scenes that didn't have 
these protections. 

> Allow an option to rewrite all segments
> ---
>
> Key: LUCENE-8264
> URL: https://issues.apache.org/jira/browse/LUCENE-8264
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during 
> upgrades, if we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're 
> rewritten into the current format. However, there's no guarantee that a 
> particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily 
> be successful.
> How many merge policies support this is an open question. I propose to start 
> with TMP and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's 
> increasingly difficult as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8260) Extract ReaderPool from IndexWriter

2018-04-19 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444082#comment-16444082
 ] 

Simon Willnauer commented on LUCENE-8260:
-

here is also a review PR https://github.com/s1monw/lucene-solr/pull/12/

>  Extract ReaderPool from IndexWriter
> 
>
> Key: LUCENE-8260
> URL: https://issues.apache.org/jira/browse/LUCENE-8260
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8260.diff
>
>
> ReaderPool plays a central role in the IndexWriter pooling NRT readers and 
> making sure we write buffered deletes and updates to disk. This class used to 
> be a non-static inner class accessing many aspects including locks from the 
> IndexWriter itself. This change moves the class outside of IW and defines 
> it's responsiblity in a clear way with respect to locks etc. Now IndexWriter 
> doesn't need to share ReaderPool anymore and reacts on writes done inside the 
> pool by checkpointing internally. This also removes acquiring the IW lock 
> inside the reader pool which makes reasoning about concurrency difficult.
> This change also add javadocs and dedicated tests for the ReaderPool class.
> /cc [~mikemccand] [~dawidweiss]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8260) Extract ReaderPool from IndexWriter

2018-04-19 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8260:

Attachment: LUCENE-8260.diff

>  Extract ReaderPool from IndexWriter
> 
>
> Key: LUCENE-8260
> URL: https://issues.apache.org/jira/browse/LUCENE-8260
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8260.diff
>
>
> ReaderPool plays a central role in the IndexWriter pooling NRT readers and 
> making sure we write buffered deletes and updates to disk. This class used to 
> be a non-static inner class accessing many aspects including locks from the 
> IndexWriter itself. This change moves the class outside of IW and defines 
> it's responsiblity in a clear way with respect to locks etc. Now IndexWriter 
> doesn't need to share ReaderPool anymore and reacts on writes done inside the 
> pool by checkpointing internally. This also removes acquiring the IW lock 
> inside the reader pool which makes reasoning about concurrency difficult.
> This change also add javadocs and dedicated tests for the ReaderPool class.
> /cc [~mikemccand] [~dawidweiss]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8259) Extract ReaderPool from IndexWriter

2018-04-19 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8259:
---

 Summary:  Extract ReaderPool from IndexWriter
 Key: LUCENE-8259
 URL: https://issues.apache.org/jira/browse/LUCENE-8259
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)
 Attachments: extract_reader_pool.diff

ReaderPool plays a central role in the IndexWriter pooling NRT readers and 
making sure we write buffered deletes and updates to disk. This class used to 
be a non-static inner class accessing many aspects including locks from the 
IndexWriter itself. This change moves the class outside of IW and defines it's 
responsiblity in a clear way with respect to locks etc. Now IndexWriter doesn't 
need to share ReaderPool anymore and reacts on writes done inside the pool by 
checkpointing internally. This also removes acquiring the IW lock inside the 
reader pool which makes reasoning about concurrency difficult.

This change also add javadocs and dedicated tests for the ReaderPool class.

/cc [~mikemccand] [~dawidweiss]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8260) Extract ReaderPool from IndexWriter

2018-04-19 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8260:
---

 Summary:  Extract ReaderPool from IndexWriter
 Key: LUCENE-8260
 URL: https://issues.apache.org/jira/browse/LUCENE-8260
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)
 Attachments: LUCENE-8260.diff

ReaderPool plays a central role in the IndexWriter pooling NRT readers and 
making sure we write buffered deletes and updates to disk. This class used to 
be a non-static inner class accessing many aspects including locks from the 
IndexWriter itself. This change moves the class outside of IW and defines it's 
responsiblity in a clear way with respect to locks etc. Now IndexWriter doesn't 
need to share ReaderPool anymore and reacts on writes done inside the pool by 
checkpointing internally. This also removes acquiring the IW lock inside the 
reader pool which makes reasoning about concurrency difficult.

This change also add javadocs and dedicated tests for the ReaderPool class.

/cc [~mikemccand] [~dawidweiss]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-7.x-Linux (32bit/jdk1.8.0_162) - Build # 1744 - Failure!

2018-04-18 Thread Simon Willnauer
pushed a fix, test bug - sorry for the noise

On Wed, Apr 18, 2018 at 12:47 PM, Policeman Jenkins Server
 wrote:
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/1744/
> Java: 32bit/jdk1.8.0_162 -server -XX:+UseG1GC
>
> 1 tests failed.
> FAILED:  
> org.apache.lucene.index.TestPendingSoftDeletes.testUpdateAppliedOnlyOnce
>
> Error Message:
> expected:<1> but was:<2>
>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
> at 
> __randomizedtesting.SeedInfo.seed([534121D77EFDCB0A:7E638ADEA75CBAAB]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:472)
> at org.junit.Assert.assertEquals(Assert.java:456)
> at 
> org.apache.lucene.index.TestPendingSoftDeletes.testUpdateAppliedOnlyOnce(TestPendingSoftDeletes.java:170)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at java.lang.Thread.run(Thread.java:748)
>
>
>
>
> Build Log:
> [...truncated 480 lines...]
>[junit4] Suite: org.apache.lucene.index.TestPendingSoftDeletes
>[junit4]   2> NOTE: reproduce with: ant test  

<    1   2   3   4   5   6   7   8   9   10   >