[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-22 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891414#action_12891414
 ] 

Michael Busch commented on LUCENE-2555:
---

What shall we do about index backward-compatibility?

I guess 4.0 has to be able to read shared doc stores?  So a lot of that code we 
can't remove? :(

> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891422#action_12891422
 ] 

Jason Rutherglen commented on LUCENE-2555:
--

Maybe we should break backwards-compatibility for the RT branch?  Or just ship 
an RT specific JAR to keep things simple?

> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891433#action_12891433
 ] 

Michael McCandless commented on LUCENE-2555:


The reading side of shared doc stores is quite trivial; I think we should keep 
it (keep back compat).

> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891558#action_12891558
 ] 

Shai Erera commented on LUCENE-2555:


What are the performance implications of removing shared doc stores? From what 
I understand, if several segments share the same doc store, then when they are 
merged, the doc stores aren't merged. Which is a great benefit, especially if 
you intend to store large fields.

I understand (mostly from the discussion on the PTDW) that with the move to a 
per-thread approach, the doc stores cannot be shared between segments created 
by different threads, but what about segments created by the same thread? Are 
we losing that functionality?

> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-23 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891654#action_12891654
 ] 

Jason Rutherglen commented on LUCENE-2555:
--

Shai, I think Mike has outlined the pros and cons in other
postings see:

https://issues.apache.org/jira/browse/LUCENE-2324?focusedCommentId=12891256&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acommenttabpanel#action_12891256
 

Basically when going to DWPTs we're losing shared doc stores
completely, and sharing between threads probably doesn't make
sense. However we can keep the reading of shared doc stores for
back compat. I think the confusing part in the code is the
writing of shared doc stores, and I'm glad that's going away. In
addition the DWPT code completely streamlines some of the most
confusing parts of the IndexWriter class tree (the wait notify,
and per thread logic in particular). Overall this will help
future folks when they're trying to customize IndexWriter, and
in addition, remove a layer of complexity, as we add yet another
layer of complexity with the RT code.


> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-23 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891655#action_12891655
 ] 

Michael Busch commented on LUCENE-2555:
---

{quote}
I understand (mostly from the discussion on the PTDW) that with the move to a 
per-thread approach, the doc stores cannot be shared between segments created 
by different threads, but what about segments created by the same thread? Are 
we losing that functionality?
{quote}

We discussed that in LUCENE-2324 (close to the bottom).  The problem is that 
doc stores only help you if you merge segments that all share the same store.  
With DWPT that's extremely unlikely.  


{quote}
What are the performance implications of removing shared doc stores? 
{quote}

I agree we have to test this when this patch is complete.  My hope is that we 
save in other places (removing the interleaving step of the per-thread 
postings, no wait queue that serializes writing to doc stores) so that overall 
we won't be slower.

> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891659#action_12891659
 ] 

Michael McCandless commented on LUCENE-2555:


{quote}

bq. What are the performance implications of removing shared doc stores?

I agree we have to test this when this patch is complete. My hope is that we 
save in other places (removing the interleaving step of the per-thread 
postings, no wait queue that serializes writing to doc stores) so that overall 
we won't be slower.
{quote}

Also, remember that shared doc stores is not as good an opto as it used to be, 
because we are now able to bulk-copy both stored fields and term vectors during 
merging.

However, bulk merging only happens if the field name -> number mapping is 
congruent, b/w the merged segment and the one segment being merged.

Unfortunately, you can easily unexpectedly break this (see LUCENE-1737) but eg 
adding diff't fields to your docs, or adding same fields just in a different 
order.

> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891668#action_12891668
 ] 

Shai Erera commented on LUCENE-2555:


Thanks for the explanation. Let's remember though that not all apps are 
multi-threaded, but I think most are, so designing for the most is better than 
making the other few more performing. I'm generally ok with that, just wanted 
to better understand the reasons.

> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2555) Remove shared doc stores

2010-07-23 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891709#action_12891709
 ] 

Jason Rutherglen commented on LUCENE-2555:
--

Michael, nice!  A lot is cleaned up.  

> Remove shared doc stores
> 
>
> Key: LUCENE-2555
> URL: https://issues.apache.org/jira/browse/LUCENE-2555
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: lucene-2555.patch
>
>
> With per-thread DocumentsWriters sharing doc stores across segments doesn't 
> make much sense anymore.
> See also LUCENE-2324.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org