[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-18 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2814:


Attachment: LUCENE-2814.patch

Synced to trunk.

bq. Also, on the nocommit on exc in DW.addDocument, yes I think that 
(IFD.deleteNewFiles, not checkpoint) is still needed because DW can orphan the 
store files on abort?
Orphaned files are deleted directly in StoredFieldsWriter.abort() and 
TermVectorsTermsWriter.abort(). As I said - all the open files tracking is now 
gone.
Turns out checkpoint() is also no longer needed.

I have no other lingering cleanup urges, this is ready to be committed. I think.

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, 
 LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-17 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2814:


Attachment: LUCENE-2814.patch

New patch. Now with even more lines removed!

DocStore-related index chain components used to track open/closed files through 
DocumentsWriter.
Closed files list was unused, and is silently gone.
Open files list was used to:
* prevent not-yet-flushed shared docstores from being deleted by 
IndexFileDeleter.
** no shared docstores, no need + IFD no longer requires a reference to DW
* delete already opened docstore files, when aborting.
** index chain now handles this on its own + has cleaner error handling code.

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, 
 LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2814:


Attachment: LUCENE-2814.patch

First iteration.

Passes all tests except TestNRTThreads. Something to do with numDocsInStore and 
numDocsInRam merged together?
Lots of non-critical nocommits (just markers for places I'd like to recheck).
DW.docStoreEnabled and *.closeDocStore() have to go, before committing

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2814:


Attachment: LUCENE-2814.patch

Patch updated to trunk, no nocommits, no *.closeDocStore(), tests pass.

SegmentWriteState vs DocumentsWriter bother me.
We track flushed files in both, we inconsistently get current segment from both 
of them.


 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org