[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216.patch here is a new patch that moves the DocValues configuration to setters. I also added a randomizeCodec(Codec) to LuceneTestCase that sets the CFS flag at random. Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216.patch we are getting closer to the overall target here. This patch enables each codec to decided to use CFS for DocValues or write individual files. To configure this and more stuff per codec I introduced a CodecConfig (just like IWC) that holds configuration for core codecs and is passed to each codec on creation. I added testcases for the Config and for nested CFS in the case IW or SegmentMerger decides to use CFS too so DocValues still can safely open the CFS. For test coverage I added a static newCodecConfig() to LuceneTestCase that randomly configures a codec per file to use CFS or individual files for DocValues and other stuff I figured make sense in the CodecConfig. All tests pass and there is no nocommit left I think its close. Review is appreciated Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3239.patch since the vote has passed here is a patch to cut over the build and references to 1.6 Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch, LUCENE-3239.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Comment: was deleted (was: since the vote has passed here is a patch to cut over the build and references to 1.6) Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: (was: LUCENE-3239.patch) Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216.patch one more iteration adding a NestedCompoundDirectory that uses the parents openInputSlice method for efficiency. Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216.patch this patch converts all docvalue types to index into memory. The majority now also merges directly to disk except of PackedInts, sorted and deref byte variants Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216.patch I committed the latest patch, this patch is a first sketch using the CFS separately in DocValuesConsumer / Producer to reduce the number of files created by DocValues. Yet, this is currently two files per codec in a segment (.cfs .cfe) which is not too bad though but we could go even further and have a global CFS for all docValues that could be pulled on demand the patch still has some nocommits but all tests pass. Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216.patch next iteration. this patch also includes FixedStraightBytes converted to use an in memory ByteBlockPool for indexing and straight disk access for merging. Yet, I tend to leave out the VarStraightBytes variant and open a follow up issue that converts the VarStraight case to use a skip list. A review would be cool otherwise I will commit in a day or two if nobody objects. Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216.patch next iteration, this time fixing most of the Byte variants to only write / open one file at a time. Straight variants are still missing. Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216_floats.patch here is a first patch that converts the floats impl to buffer values in ram during indexing but writes values directly during merge. all tests pass I plan to commit this soon too. Rather go small iterations here instead of a large patch. Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org