[jira] [Commented] (CASSANDRA-674) New SSTable Format

2011-08-08 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081105#comment-13081105
 ] 

Stu Hood commented on CASSANDRA-674:


bq. Is replicate-on-write=disabled why uncompressed went from the highest 
latency to the lowest?
Yes: replicate-on-write triggers a huge number of reads, which are much more 
expensive in trunk, due to 2319 not being included.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 1.0
>
> Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, 674-ycsb.log, 
> trunk-ycsb.log
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for 
> remaining issues. It is roughly the format described here: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-674) New SSTable Format

2011-08-08 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080929#comment-13080929
 ] 

Chris Burroughs commented on CASSANDRA-674:
---

Is replicate-on-write=disabled why uncompressed went from the highest latency 
to the lowest?

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 1.0
>
> Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, 674-ycsb.log, 
> trunk-ycsb.log
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for 
> remaining issues. It is roughly the format described here: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-674) New SSTable Format

2011-08-07 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080757#comment-13080757
 ] 

Stu Hood commented on CASSANDRA-674:


I reran the test mentioned in [#comment-13054228] with replicate-on-write 
disabled, which makes for a much more fair comparison (trunk/47 require 2 seeks 
to miss for a column, and 3 to hit). This version of trunk also includes 
CASSANDRA-47 snappy compression.

|| build || disk volume (bytes) || bytes per column || runtime (s) || 
throughput (ops/s) || avg read ms || 99th % read ms ||
| trunk - uncompressed | 16,713,328,798 | 66.8 | 6154 | 40620 | 2.54 | 6 |
| trunk - gz 6 * | 2,747,319,000 | 10.98 |-|-|-|-|
| trunk - [snappy|https://issues.apache.org/jira/browse/CASSANDRA-47] | 
4,356,461,652 | 17.4 | 7906 | 31618 | 4.64 | 15 |
| 674+2319 | 2,675,888,207 | 10.7 | 7703 | 32454 | 3.04 | 10 |
\* _trunk - gz 6_ is the size of compressing the data directory of the trunk 
result at GZIP level 6

In this workload, we're reading from the tail of the row, which means that 
CASSANDRA-47 needs to decode two blocks per read (one for the row index at the 
head of the row, and one for the columns at the tail).

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 1.0
>
> Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, 674-ycsb.log, 
> trunk-ycsb.log
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for 
> remaining issues. It is roughly the format described here: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-674) New SSTable Format

2011-07-09 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062662#comment-13062662
 ] 

Stu Hood commented on CASSANDRA-674:


I've posted the slightly-divergent branch of YCSB I used for this workload at 
https://github.com/stuhood/YCSB/tree/monotonic-timeseries

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 1.0
>
> Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, 674-ycsb.log, 
> trunk-ycsb.log
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for 
> remaining issues. It is roughly the format described here: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-674) New SSTable Format

2011-06-24 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054287#comment-13054287
 ] 

Stu Hood commented on CASSANDRA-674:


To clarify, I included the "trunk gz 6" result since it is essentially a lower 
bound for block-based compression. On the other hand, there is some low hanging 
fruit that could decrease the size of the 674-2319 by another 1 to 1.5 bytes 
per column.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 1.0
>
> Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, 674-ycsb.log, 
> trunk-ycsb.log
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for 
> remaining issues. It is roughly the format described here: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-02-01 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989112#comment-12989112
 ] 

Stu Hood commented on CASSANDRA-674:


One of the key blockers is implementing rebuilding of SSTables post-streaming. 
Based on an IRC conversation yesterday, the smoothest way to support streaming 
of older SSTable versions was to ABC and subclass what is now the 
SSTableWrite.Builder object: I'll probably try to do this in a separate ticket.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, 674-v2.tgz, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for 
> remaining issues. It is roughly the format described here: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-12 Thread Holden Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981109#action_12981109
 ] 

Holden Robbins commented on CASSANDRA-674:
--

Feel free to tell me I'm off-base here, but what about doing something super 
simple like storing the segment as compressed and un-compressing when it's 
accessed on disk.   Compaction process can possibly clean up uncompressed 
segments?  I'm thinking this would solve my particular use case well (log data) 
since our requirements are to store a large amount of data but the majority of 
the reads will only be on a small subset of recently inserted data.

If it sounds like a decent approach I'll be happy to put together a patch.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979172#action_12979172
 ] 

Jonathan Ellis commented on CASSANDRA-674:
--

Here is an interesting paper on a way to get both good inter-record and 
intra-record data locality: 
http://scholar.google.com/scholar?q=A+Storage+Model+to+Bridge+the+Processor/Memory+Speed+Gap.

Not sure how to apply that to an arbitrarily-large-rows model like ours tho.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-05 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978183#action_12978183
 ] 

Stu Hood commented on CASSANDRA-674:


> If we assume we keep the datamodel as is how can we simplify the open 
> ended-ness of your design to make the approach fit our current data model.
To keep this from becoming a point of contention, I'll remove that goal from 
the design doc: the design so far has this feature as a side effect though.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-05 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978155#action_12978155
 ] 

Stu Hood commented on CASSANDRA-674:


>> Indexes for individual rows are gone, since the global index allows random 
>> access...
> ^ This wouldn't be useful to cache? in the situation you only want a small 
> range of columns?
That information is outdated: it's from the original implementation. But yes... 
we will want to keep the index in app memory or page cache.

> Roughly how large would the actual chunk be? This is the unit of 
> deserialization right?
The span is the unit of deserialization (made up of at most 1 chunk per level), 
and its size would be 100% configurable. The main question is how frequently to 
index the spans in the sstable index: does each span get an index entry? or 
only the first span of a row (this is our approach in the current 
implementation).

> So if you are doing a range query on a very wide row how do you know when to 
> stop processing chunks?
By looking at the global index: if all spans get entries in the index, you know 
the last interesting span.

> Let me know if this is wrong, but this design opens the cassandra data model 
> to contain arbitrarily nested data.
> Given the complexity we already have surrounding the supercolumn concept do 
> you think this is the right way forward? 
The super column concept is only confusing _because_ we call them 
"supercolumns" rather than just calling them "compound column names". People 
use them, and the consensus I've heard is that they are useful.

> If we assume we keep the datamodel as is how can we simplify the open 
> ended-ness of your design to make the approach fit our current data model.
The only difference is what you call the structures, and whether you put 
arbitrary limits on the nesting: I'm open to suggestions.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-05 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977785#action_12977785
 ] 

T Jake Luciani commented on CASSANDRA-674:
--



Let me know if this is wrong, but this design opens the cassandra data model to 
contain arbitrarily nested data.

Given the complexity we already have surrounding the supercolumn concept do you 
think this is the right way forward?  
As much as my inner geek wants to build a tree or graph model I don't think the 
C* community or committers want to take it this way.

If we assume we keep the datamodel as is how can we simplify the open 
ended-ness of your design to make the approach fit our current data model.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-05 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1296#action_1296
 ] 

T Jake Luciani commented on CASSANDRA-674:
--

bq. the metadata is useless on it's own. It only becomes useful when it is 
attached to data (a column or to a range), so there is no reason to cache the 
meta- independently of the data.

But above you mention:
{code}
Indexes for individual rows are gone, since the global index allows random 
access to the middle of column families that span Blocks, and Slices allow 
batches of columns to be skipped within a Block.
{code}

^ This wouldn't be useful to cache? in the situation you only want a small 
range of columns? 

- More questions 
Roughly how large would the actual chunk be? This is the unit of 
deserialization right? or can avro deserialize only part of a structure?

So if you are doing a range query on a very wide row how do you know when to 
stop processing chunks? do you keep going till you hit the sentinel value 
 ?





> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-05 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977697#action_12977697
 ] 

Stu Hood commented on CASSANDRA-674:


> How will ranges be stored? The parent ordering would mean the sorting of data 
> at that level is lost no?
Added some explanation of how I think ranges should work to the wiki. 
http://wiki.apache.org/cassandra/FileFormatDesignDoc?action=diff&rev1=15&rev2=16

> Are chunks broken up by size only?
Technically "spans" are the largest unit, so they define the boundaries: tried 
to clarify this part as well. There are a few possible thresholds, including a 
max number of rows, columns, range tombstones or total bytes in the span.

One semi-undefined portion is what happens when a row is larger than can be 
stuffed in a span. Most likely we'll want to use the range metadata to indicate 
the portion of the row covered by the span (the approach I took in the original 
implementation attached here).

> Will the metadata be ripe for caching?
I don't think so: the metadata is useless on it's own. It only becomes useful 
when it is attached to data (a column or to a range), so there is no reason to 
cache the meta- independently of the data.

Thanks!

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-04 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977609#action_12977609
 ] 

T Jake Luciani commented on CASSANDRA-674:
--

 As I try to wrap my head around this I'm listing questions that come to mind:

 - How will ranges be stored?   The parent ordering would mean the sorting of 
data at that level is lost no?
 - Are chunks broken up by size only?
 - Will the metadata be ripe for caching?
 

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2011-01-01 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976469#action_12976469
 ] 

Stu Hood commented on CASSANDRA-674:


Thinking about this issue again. Dumped some thoughts I had on paper to the 
wiki: http://wiki.apache.org/cassandra/FileFormatDesignDoc .

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2010-07-07 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886030#action_12886030
 ] 

Ryan King commented on CASSANDRA-674:
-

YES!

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-674) New SSTable Format

2010-07-07 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886023#action_12886023
 ] 

Stu Hood commented on CASSANDRA-674:


After having played with Avro a bit more, I'm all for using its DataFile format 
in the SSTable. The variable length integer encoding, built in compression, 
schema migration and block recovery schemes are win.

> New SSTable Format
> --
>
> Key: CASSANDRA-674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-674
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 674-v1.diff, perf-674-v1.txt, 
> perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, 
> including #16, #47 and #328. Attached is a proposed design/implementation of 
> a new file format for SSTables that addresses a few of these limitations. The 
> implementation has a bunch of issues/fixmes, which I'll describe in the 
> comments.
> The file format is described in the javadoc for the o.a.c.io.SSTableWriter 
> class, but briefly:
>  * Blocks are opaque (except for their header) so that they can be 
> compressed. The index file contains an entry for the first key in every 
> Block. Blocks contain Slices.
>  * Slices are series of columns with the same parents and (deletion) 
> metadata. They can be used to represent ColumnFamilies or SuperColumns (or a 
> slice of columns at any other depth). A single CF can be split across 
> multiple Slices, which can be split across multiple blocks.
>  * Neither Slices nor Blocks have a fixed size or maximum length, but they 
> each have target lengths which can be stretched and broken by very large 
> columns.
> The most interesting concepts from this patch are:
>  * Block compression is possible (currently using GZIP, which has one bug 
> mentioned in the comments),
>  * Compaction involves merging intersecting Slices from input SSTables. Since 
> large rows will be broken down into multiple slices, only the portions of 
> rows that intersect between tables need to be 
> deserialized/merged/held-in-memory,
>  * Indexes for individual rows are gone, since the global index allows random 
> access to the middle of column families that span Blocks, and Slices allow 
> batches of columns to be skipped within a Block.
>  * Bloom filters for individual rows are gone, and the global filter contains 
> ColumnKeys instead, meaning that a query for a column that doesn't exist in a 
> row that does will often not need to seek to the row.
>  * Metadata (deletion/gc time) and ColumnKeys (key, colname1, colname2...) 
> for columns are defined recursively, so deeply nested slices are possible,
>  * Slices representing a single parent (CF, SC, etc) can have different 
> Metadata, meaning that a tombstone Slice from d-f could sit between Slices 
> containing columns a-c and g-h. This allows for eventually consistent range 
> deletes of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.