[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-07-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364317#comment-15364317
 ] 

ASF subversion and git services commented on LUCENE-7302:
-

Commit 4ff882e4aa9cb7fc585213bca9344fa05d1bec5f in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ff882e ]

LUCENE-7302: IW.getMaxCompletedSequenceNumber was returning the wrong value 
after IW.deleteAll


> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-07-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364308#comment-15364308
 ] 

ASF subversion and git services commented on LUCENE-7302:
-

Commit 503da1fcb9fa96c2ba62e9164ee38011b2e23669 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=503da1f ]

LUCENE-7302: IW.getMaxCompletedSequenceNumber was returning the wrong value 
after IW.deleteAll


> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-06-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333644#comment-15333644
 ] 

ASF subversion and git services commented on LUCENE-7302:
-

Commit 5a0321680fe5e57a17470b824024d5b56a4cbaa4 in lucene-solr's branch 
refs/heads/apiv2 from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a03216 ]

LUCENE-7302: ensure IW.getMaxCompletedSequenceNumber only reflects a change 
after NRT reader refresh would also see it


> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-06-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329079#comment-15329079
 ] 

ASF subversion and git services commented on LUCENE-7302:
-

Commit 8ed16fd1f9a03c66d4ac81ddaa7ab70359410b95 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ed16fd ]

LUCENE-7302: ensure IW.getMaxCompletedSequenceNumber only reflects a change 
after NRT reader refresh would also see it


> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-06-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329071#comment-15329071
 ] 

ASF subversion and git services commented on LUCENE-7302:
-

Commit 5a0321680fe5e57a17470b824024d5b56a4cbaa4 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a03216 ]

LUCENE-7302: ensure IW.getMaxCompletedSequenceNumber only reflects a change 
after NRT reader refresh would also see it


> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326028#comment-15326028
 ] 

ASF subversion and git services commented on LUCENE-7302:
-

Commit 32c8dfaad5c6d8f79b7d0d7d917db0605f27a9ea in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=32c8dfa ]

LUCENE-7302: move CHANGES entry to the right place


> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326027#comment-15326027
 ] 

ASF subversion and git services commented on LUCENE-7302:
-

Commit 00584579b70041addbd47859012e25e67e079e10 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0058457 ]

LUCENE-7302: move CHANGES entry to the right section


> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-06-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315833#comment-15315833
 ] 

Michael McCandless commented on LUCENE-7302:


I'll backport this after 6.1 branch is cut (for 6.2).

> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-06-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315827#comment-15315827
 ] 

ASF subversion and git services commented on LUCENE-7302:
-

Commit b1fb142af003386f985b4c4ad1a583d009d49e41 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b1fb142 ]

LUCENE-7302: Merge branch 'sequence_numbers'


> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 6.1, master (7.0)
>
> Attachments: LUCENE-7032.patch, LUCENE-7132.patch
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

2016-05-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302757#comment-15302757
 ] 

Michael McCandless commented on LUCENE-7302:


I've been pushing changes to this branch:

  https://github.com/mikemccand/lucene-solr/tree/sequence_numbers

I think it's close ... I've resolved all nocommits, and created some
fun tests with threads updating the same doc at once, doing concurrent
commits, and verifying what the sequence numbers claim turns out to be
true.

The changes are relatively minor: IW already "knows" the order that
operations were applied, but these methods return {{void}} today and
this changes them to return {{long}} instead.  Callers who don't
care can just ignore the returned long.

It also lets us remove the wrapper class {{TrackingIndexWriter}} which
was doing basically the same thing (returning a long for each op) but
with weaker guarantees.

These sequence numbers are fleeting, not saved into commit points,
etc., and only useful within one IW instance (they reset back to 1 on
the next IW instance).

I'll build an applyable patch and post here ...

> IndexWriter should tell you the order of indexing operations
> 
>
> Key: LUCENE-7302
> URL: https://issues.apache.org/jira/browse/LUCENE-7302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 6.1, master (7.0)
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
> indexed concurrently since they could instead check the returned
> sequence number to know which update "won", for features like
> "realtime get".  (Locking is probably still needed for features
> like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
> recovering after a crash from a transaction log, they can know
> exactly which operations made it into the commit and which did
> not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org