[jira] [Commented] (CASSANDRA-2319) Promote row index

2011-08-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087925#comment-13087925
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

Would this allow us to do CASSANDRA-2498 style optimizations on slice queries?

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: compression, index, timeseries
> Fix For: 1.0
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2011-08-22 Thread Ryan King (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088736#comment-13088736
 ] 

Ryan King commented on CASSANDRA-2319:
--

I haven't followed that ticket closely, but I think the answer is yes. For wide 
row use cases this patch lets you eliminate SStables with only the info in the 
index (because we know what range(s) of columns for a row are in that file).

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: compression, index, timeseries
> Fix For: 1.0
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006608#comment-13006608
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

I think this would also mean we can remove the back-seek from sstable writing. 
In which case I am a huge fan in principle.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007010#comment-13007010
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

I'm reasonably in favor of this too, given that it would also give us a good 
place to add checksumming (which should be a reasonably short term priority 
imho) and compression.

There is however a few things to considered:
  * This will remove as a consequence the row bloom filter (maybe it makes 
sense to promote it too in some form to the sstable level, but I'm not sure). 
Row bloom filter is not the most useful trick we have around (only useful for 
names query for starter). Still something to keep in mind.
  * There will be new trade-offs: either you index 'often' or the index becomes 
potentially much bigger (in which case you push the problem to the sparse 
in-memory index). I suspect it will be harder to have a 'one configuration fits 
all'.


> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-15 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007128#comment-13007128
 ] 

Stu Hood commented on CASSANDRA-2319:
-

> I think this would also mean we can remove the back-seek from sstable writing.
> In which case I am a huge fan in principle.
I suspect that this is more challenging: the back seek also writes the row 
length. We'd need to move to a blocked design like the one on 674 to replace 
the row length with a block length in the data file.

> This will remove as a consequence the row bloom filter
Not necessarily, but I think this ticket does highlight the fact that the 
column bloom filter is ill-positioned: they can prevent the 3rd seek, but only 
for names queries which (I suspect) are less likely on wide rows. Nonetheless, 
I can imagine wanting to do a point query for a secondary index to determine 
whether a particular row matches the index, so we should probably consider 
promoting it as well.

> There will be new trade-offs: either you index 'often' or the index becomes 
> potentially much bigger
The important thing to remember is that the distinction between columns and 
keys should be very fuzzy: columns are a suffix on keys, and treating them 
otherwise leads to complications. In this case, we shouldn't be holding every 
128th "key" in memory, but instead every 128th-512th _tuple_: that way wide 
rows are handled naturally.

This also normalizes our indexing: the size of your index depends on the 
_total_ number of columns, instead of on the width and number of your rows.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-17 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007965#comment-13007965
 ] 

T Jake Luciani commented on CASSANDRA-2319:
---

This would make the key cache much less effective no? also the index file would 
now contain an entry for every row rather than every N

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-17 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007977#comment-13007977
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

bq. The important thing to remember is that the distinction between columns and 
keys should be very fuzzy: columns are a suffix on keys, and treating them 
otherwise leads to complications. In this case, we shouldn't be holding every 
128th "key" in memory, but instead every 128th-512th tuple: that way wide rows 
are handled naturally.

I'm not saying anything else. What I'm saying is that there is potentially 
order of magnitudes more 'tuple' than there are keys. So far I doubt many 
people have ever changed the index_interval value. I suppose this will change 
if we do this (I mean, we have advertised you can have 2 billions columns per 
rows after all :)), and we may even be willing to move index_interval 
configurable per-cf. In turns, this will be more things to consider for the 
user and a bigger chance for them to get OOM if they are not careful.  Probably 
nothing horrible, but let just make sure we do understand a maximum of the pros 
and cons.

Also, as tjake remarked, it is unclear how to update the key cache with that 
proposal. You could cache column position ('tuple') directly, but that will be 
much less useful. Maybe the key cache could be kept but limited to skinny rows. 
Something to consider anyway.


> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-17 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008150#comment-13008150
 ] 

Stu Hood commented on CASSANDRA-2319:
-

> I'm not saying anything else. What I'm saying is that there is potentially
> order of magnitudes more 'tuple' than there are keys.
Agreed... my point is simply that the number of columns-per-key and the number 
of keys are inversely proportional: if you have more columns-per-key, you have 
less keys, and vice-versa. The index will grow proportionally with the total 
number of columns, not with the number of keys.

> Also, as tjake remarked, it is unclear how to update the key cache with that 
> proposal.
Yea... the key cache as it exists does not necessarily need to change, but at 
some point we'll want to update it to include the improvements from this ticket.
> You could cache column position ('tuple') directly, but that will be much 
> less useful.
I'm not sure that this would be less useful: promoting the index takes us from 
3 to 2 seeks for wide rows: making this particular change to the key cache 
would take it from 2 to 1 seek for wide rows (get cached position, seek 
directly to column). It depends on the number of unique queries to the row, but 
I'm willing to be that the number of unique queries is relatively low.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008169#comment-13008169
 ] 

Edward Capriolo commented on CASSANDRA-2319:


Is the plan to let users chose type on a per CF basis? If so by version 0.8.0 
will a user chose between standard | compressed | wide when creating a CF. 

I think the use case and description makes sense however 
{quote}
The index will grow proportionally with the total number of columns, not with 
the number of keys.
{quote}
This worries me. We tend to do getSlice for entire keys. So having these large 
indexes, does not bring this use case much. 
{quote}
For narrow rows, this change would have no effect, as they will not reach the 
threshold for indexing anyway.
{quote}
What is the definition of a narrow row?

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-17 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008180#comment-13008180
 ] 

Stu Hood commented on CASSANDRA-2319:
-

> What is the definition of a narrow row?
I expect that the definition will still be approximately "less than 
column_index_size_in_kb". For those users, the change would have no effect, 
since they didn't have a row index in the first place.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-18 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008392#comment-13008392
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

bq. Agreed... my point is simply that the number of columns-per-key and the 
number of keys are inversely proportional: if you have more columns-per-key, 
you have less keys, and vice-versa. The index will grow proportionally with the 
total number of columns, not with the number of keys.

I do not share your confidence that this is axiomatic. It is certainly not 
axiomatic to the data model. Anyway, that was just a remark, not a criticism of 
the approach.

bq. Yea... the key cache as it exists does not necessarily need to change, but 
at some point we'll want to update it to include the improvements from this 
ticket.

Maybe there is a misunderstanding here. I assumed that promoting the row index 
implied removing the row index (in the favor of a richer sstable index).  And 
even though a first iteration of this doesn't necessary imply this removal, 
I'll still assume it because I believe this would be weird to keep in the long 
run even if we keep it in the short run.

So if you don't have a row index, caching row key position as the actual key 
cache does will be counter-productive for any non-narrow row, since looking at 
the sstable index would give you closer to the column. So it would make the key 
cache as it exists only useful for narrow row (which makes it less useful 
though not useless).

bq. It depends on the number of unique queries to the row, but I'm willing to 
bet that the number of unique queries to a row is relatively low.

Take time series (which I doubt can be called a niche use case). If the start 
of your slice query depends on the current time, almost all the query will be 
unique. Or if you page on the time series and it have a reasonably high rate of 
inserts, then the pages will be always changing and thus will be your query.  
Given how long it took me to come up with those two examples (that I did 
personally used btw, it's not just my imagination running wild), I suspect 
there is a number of other similar cases.

Will those be a minority of all the queries on wide rows ? I don't know, 
probably for some people but maybe not for others. People come up with new ways 
to use the Cassandra data model all the time, let's not base our reflexion on 
unchecked assumptions of the kind of queries people do.

Is that a big deal considering that in promoting the row index we will be at 2 
seeks for those case but we're already at 2 seeks on a key cache hit ?  
Probably not (though for the pleasure of nitpicking I'll add that the 2 seeks 
in the current case of a key cache hit are closer on disk that how the 2 seeks 
with the promoted index would be).

Am I willing to not keep those case in mind because Stu is willing to bet it 
doesn't matter ? Certainly not (which doesn't mean I don't love you).

Anyways, I'm in favor of trying this (honestly more because I believe this 
could make checksumming and compression easier to add than anything else).

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesign

[jira] Commented: (CASSANDRA-2319) Promote row index

2011-03-19 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008867#comment-13008867
 ] 

Stu Hood commented on CASSANDRA-2319:
-

A solution for the key cache is to allow for fuzzy cache entry matches via a 
sorted cache structure (if it existed, something like 
ConcurrentLinkedSkipListMap would be ideal).

The key cache as it exists gives us exact matches for keys, but when the 
resolution of the cache increases to columns, the chance of hitting the same 
column twice (while reasonably high) is not high enough. Ideally we'd be able 
to fuzzily hit a nearby/next-highest cache entry that represent a range or 
block of columns.

An example: the cache contains an entry for columns in the range: 
{("user1","entry0100"), ("user1","entry0200")} if a query comes in for a slice 
starting from ("user1", "entry0150"), we would perform a fuzzy/floor lookup in 
the cache and hit our entry. A lookup that doesn't fall into the range covered 
by a cache entry would be a miss, and would result in reading from the index, 
and the smallest range bounding the lookup being added to the cache.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 0.8
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2319) Promote row index

2011-04-09 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017953#comment-13017953
 ] 

Stu Hood commented on CASSANDRA-2319:
-

Regarding the sorted cache structure: it looks like it could be accomplished by 
using a MapMaker LRU map, and an eviction listener to maintain a CSLM: fairly 
heavy, but...

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 1.0
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2319) Promote row index

2011-04-14 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020192#comment-13020192
 ] 

Stu Hood commented on CASSANDRA-2319:
-

The attached patch depends on 2336 and 2398, and implements a compressed, 
promoted index (called NestedIndex) containing enough information to eliminate 
wide rows without seeking into the data file. Only some of the features from 
the issue description are actually implemented: the primary goal of this first 
version was to begin eliminating sstables for wide-row (time series) usecases.

The nested index contains nearly as much information as our current row header: 
the only thing missing at this point is the bloom filter (which I think needs 
more thought to get right, and wasn't critical for our slice usecase).

Narrow rows (rows < column_index_size) are represented in the index by their 
key, offset and two bits to indicate that neither metadata nor column names 
have been stored in the index. Rows wider than column_index_size will have an 
entry containing their metadata and one or more column entries, with a bit per 
column entry to indicate their ownership.

The key cache stores RowHeaders, which have a base implementation that is 
essentially just a boxed long offset (matching our existing impl in memory 
usage). Wide rows use the NestedRowHeader subclass, which additionally contains 
row metadata and the min and max columns in the row, which is enough to 
eliminate a row for slices. The intention is that as our file format migrates 
toward being block based, RowHeader will evolve into a BlockHeader describing a 
seekable point in the data file (and possibly cached in a sorted structure as 
described above).

Data in the index mostly uses the design from 
[[FileFormatDesignDoc|http://wiki.apache.org/cassandra/FileFormatDesignDoc]], 
and is compressed using either type specific compression or LZF (via 
CASSANDRA-2398). The end result is that for a narrow-row usecase and simple 
keys, the index is half the size of the existing implementation: complex keys 
will likely see larger benefits.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 1.0
>
> Attachments: 2319-v1.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2319) Promote row index

2011-04-14 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020195#comment-13020195
 ] 

Stu Hood commented on CASSANDRA-2319:
-

Output from some stress.java runs is attached as version-*.txt:
* version-f: trunk
* version-g: promoted index _without_ LZF, but with type specific compression
* version-g-lzf: promoted index with all the features of 2398

A slightly larger scale wide row test is described in promotion.pdf
* 90/10 writes/reads for 6 hours
** appending to 30,000 keys
** reading the tail of the row
* sstables-per-read histogram for a wide-row workload
* trunk (unpromoted) vs 2319 (promoted)
* ~12GB of data, ~3GB of page cache

Two takeaways:
# Promotion allowed one less sstable to be accessed on average
** At the end of both runs, the largest sstable was 6GB, which was also the 
only file large enough for promotion to kick in (for 30,000 keys and 
column_index_size=64KB, it takes an sstable larger than 1.92GB for the nested 
index to start promoting column names)
# the promoted index was able to accomplish 25% more reads during the time 
period covered by the test, likely due to hitting one less file

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: index, timeseries
> Fix For: 1.0
>
> Attachments: 2319-v1.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-07 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224147#comment-13224147
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

I've put a version of this issue at 
https://github.com/pcmanus/cassandra/commits/2319_index_promotion (against 
current trunk). Contrarily to the previously attached patches, this doesn't 
change the file format much. It pretty literally do what the issue title said: 
it promotes the columns index from the data file to the index file. Note that 
the patch is split in 3 commits that have some form of logical separation but 
the code only compile with all 3 commits.

So this remove the column index and bloom filter from the row header in the 
data file and move them in the index file along with the (key,position) pair. 
There is a number of choices/details worth mentioning:
* Only wide rows have a column index and bloom filter. So one difference with 
the current implementation is that skinny rows have no column bloom filter. I 
figure that it's probably not worth the space in the index file in that latter 
case (but I'm fine discussing that point)
* The key cache now keeps the whole information from the index file for a given 
row. This means that for wide rows, column index and bf are cached along with 
the position. Which is imo a good thing, but does mean the size of a key cache 
entry is not constant anymore (The estimation of the key cache memory size will 
have to be modified accordingly but the current patch don't do it).
* For wide rows, the index entry also ship with the row deletion times. This is 
necessary since we won't seek at the beginning of the row anymore.
* In the column indexes, offsets are relating to the beginning of the row in 
the data file rather than from the beginning of the index as is the case now.

Some other implementation points:
* EchoedRow is removed. It would be possible to echo rows following this patch 
but we would need to echo the column index too so that felt complicated enough 
that it could be left to a later ticket if we consider it worth it.
* I didn't found a non overly complicated/inefficient way to implement this 
patch without using seek() instead of just file marks. So in particular 
MappedFileDataInput gets a seek() method, even though that method throw an 
exception if we seek outside the segment (which should never happen).

I did a short (and honestly not very scientific) benchmark of a time series 
like workload with a number of thread inserting time series columns in a bunch 
of rows and other threads reading the tail of those rows (as expected, the 
performance degrades with more sstables added and improve with compaction). As 
soon as more than more than 1 sstable was present, the performance with this 
patch was around 30-40% better than without the patch.  I'll note that the test 
was very short and with everything on local host, so again the exact benefits 
may vary, but the ability to discard sstables based on index infos (saving a 
seek) seems to be a clear boost in that case.

I didn't saw any noticeable difference (neither good or bad) on a normal 
stress, as should be expected.

Note that this patch paves the way to removing the two phases compaction of 
LazilyCompactedRow, but that is left to a follow up ticket.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Stu Hood
>  Labels: compression, index, timeseries
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> in

[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-07 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224513#comment-13224513
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

bq. For wide rows, the index entry also ship with the row deletion times

Why just wide rows?

bq. I'm in favor of trying this (honestly more because I believe this could 
make checksumming and compression easier to add than anything else)

How did this work out, did it end up simplifying compression?

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-07 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224529#comment-13224529
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

{quote}
bq. For wide rows, the index entry also ship with the row deletion times

Why just wide rows?
{quote}

Because for narrow ones, we'll end up seeking at the beginning of the row, 
where the deletion times are anyway.

bq. How did this work out, did it end up simplifying compression?

We ended up making compression at the SequentialWriter level, i.e. transparent 
from the data and index format, so this end up having no interaction whatsoever 
with compression.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-07 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224531#comment-13224531
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

bq. for narrow [rows], we'll end up seeking at the beginning of the row, where 
the deletion times are anyway

Would it simplify things to just say "row deletion times are always in the 
index" and avoid having to have two code paths for that?

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-07 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224550#comment-13224550
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

bq. Would it simplify things to just say "row deletion times are always in the 
index" and avoid having to have two code paths for that?

Not really, at least not unless we also keep a column index entry too for the 
narrow case. Otherwise we have to seek to the beginning of the row anyway so we 
have two code paths anyway (and we'll end up deserializing the deletion times 
anyway). Besides, we also have to be backward compatible so in fact we wouldn't 
really gain much, since the new narrow case is almost the same as the old 
narrow.
Now it is true that after this patch it could be worth making the distinction 
of static CF versus dynamic CF rather than narrow versus wide as it would make 
sense to keep a column index entry in the dynamic case (even if the row is 
narrow for that sstable), but I think this warrant some experimentation first 
and I'd be in favor of leaving that to a follow up ticket.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-10 Thread Stu Hood (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226986#comment-13226986
 ] 

Stu Hood commented on CASSANDRA-2319:
-

* ColumnIndexer
** Consider renaming ColumnIndexer -> ColumnIndex, and moving all creation 
logic into factory functions... ColumnIndexer.Result is redundant
** Will creating a BloomFilter with estimated size 0 do the right thing here? 
Perhaps explicitly ask for the empty bloom filter?
* DeletionInfo
** Consider adding an EMPTY/LIVE singleton DeletionInfo
* RowIndexEntry
** the null checks in create() probably mean create() should be two overloaded 
methods, one of which checks the columnIndex size
** the {{if (dis instanceof FileDataInput) skipFully}} logic should be static 
somewhere, and should definitely use DataInput.skipBytes in a loop rather than 
allocating a buffer
** This class is probably in the wrong package... it's very sstable specific. 
Also, class javadoc.
* IndexedSliceReader 
** {code}file = originalInput == null ? 
sstable.getFileDataInput(positionToSeek) : originalInput;
file.seek(positionToSeek);{code} could probably be cleaner
* SimpleSliceReader
** {{sstable.decodeKey(ByteBufferUtil.readWithShortLength(file));}} -> 
{{ByteBufferUtil.skipShortLength}}
* General comments
** When we decide not to store metadata in the file, the datafile is no longer 
enough to recover the data in case of lost indexes, which is probably fine if 
streaming has headed in a direction where we always send the index as well. 
Once we've passed that point of no-return, we could also move metadata out for 
narrow rows and consider removing the key from the data file row header, 
leaving the datafile containing nothing but columns (minor compression bonus).


This looks good Sylvain... moving the bloom filter into the index as well looks 
like it ended up being a simple win.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-12 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227523#comment-13227523
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

Pushed a new on the github branch to address following remarks:

{quote}
* ColumnIndexer
** Consider renaming ColumnIndexer -> ColumnIndex, and moving all creation 
logic into factory functions... ColumnIndexer.Result is redundant
** Will creating a BloomFilter with estimated size 0 do the right thing here? 
Perhaps explicitly ask for the empty bloom filter?
{quote}

Agreed, that much cleaner, done.

bq. Consider adding an EMPTY/LIVE singleton DeletionInfo

Not really related to this patch but good idea, done.

bq. the null checks in create() probably mean create() should be two overloaded 
methods

We could be truth is create() is only ever called from one place where it takes 
all it's argument. In fact the null check should never be trigger in current 
code, but I figured it doesn't cost much to code defensively in case someone 
wants to use null instead of an empty index.

bq. the if (dis instanceof FileDataInput) skipFully logic should be static 
somewhere

Right, that was copy-pasted code from the old ColumnIndexer but in fact 
FileUtils.skipBytesFully() already does the right thing so I just remove the 
whole {{if}} using that.

bq. This class is probably in the wrong package

I've put it in db because the old ColumnIndexer (whose RowIndexEntry inherit 
some of the code) was there too. Overall the db package already has sstable 
related stuffs so I don't care too much and feel lazy to change it at the time 
I'm writing this. But I may move it during commit if I remember to do it.

{quote}
* IndexedSliceReader
{noformat}
file = originalInput == null ? sstable.getFileDataInput(positionToSeek) : 
originalInput;
file.seek(positionToSeek);
{noformat}
could probably be cleaner
{quote}
I'm open to suggestions :)

bq. sstable.decodeKey(ByteBufferUtil.readWithShortLength(file)); -> 
ByteBufferUtil.skipShortLength

Done

bq. When we decide not to store metadata in the file, the datafile is no longer 
enough to recover the data in case of lost indexes, which is probably fine if 
streaming has headed in a direction where we always send the index as well. 
Once we've passed that point of no-return, we could also move metadata out for 
narrow rows and consider removing the key from the data file row header, 
leaving the datafile containing nothing but columns (minor compression bonus)

The thing is, we don't yet send the index while streaming. Even with this 
patch, the data has still enough data to recompute the index file. And truth 
is, I think it's a nice property that we should probably maintain unless we 
have a good reason not too (it is true that with compression, the data file 
alone is not enough as you need to know what is the compression and the block 
size, but those are property of the CF even if we loose the compression 
component).


> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/Fi

[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-12 Thread Stu Hood (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227922#comment-13227922
 ] 

Stu Hood commented on CASSANDRA-2319:
-

bq. And truth is, I think it's a nice property that we should probably maintain 
unless we have a good reason not too
Totally agreed. My mind blipped here, and for some reason I thought that you 
had removed the metadata in addition to the BF/column-index.

+1 #shipit

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-13 Thread Dave Brosius (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228874#comment-13228874
 ] 

Dave Brosius commented on CASSANDRA-2319:
-

Perhaps i am reading this wrong, but this patch appears to make 
SSTableIdentityIterator constructor now expect sstable to not be null, which is 
not the case from this constructor

public SSTableIdentityIterator(CFMetaData metadata, DataInput file, 
DecoratedKey key, long dataStart, long dataSize, IColumnSerializer.Flag flag)
throws IOException
{
this(metadata, file, key, dataStart, dataSize, false, null, flag);
}

which is called from here IncomingStreamReader.streamIn

I would expect an NPE in this case.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230193#comment-13230193
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

You're right, I'm pushed the trivial fix with commit 65c33fa.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-22 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236333#comment-13236333
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

I see getIndexedReadBufferSizeInKB is still around, is that dead code now?

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-23 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236451#comment-13236451
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

Yes, that method is dead code but that's actually not new from this patch. I've 
created CASSANDRA-4076.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-23 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236595#comment-13236595
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

Ah, what I really meant was getColumnIndexSize.  It seems like the next step 
should be to treat rows and columns more uniformly -- we have the row 
index_interval measured in rows, but for columns we have 
column_index_size_in_kb.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-24 Thread Stu Hood (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237483#comment-13237483
 ] 

Stu Hood commented on CASSANDRA-2319:
-

bq. We ended up making compression at the SequentialWriter level, i.e. 
transparent from the data and index format, so this end up having no 
interaction whatsoever with compression.
So, this isn't _quite_ true: due to the column/tuple index being unaware of 
blocks, and vice-versa, the current scheme has to deal with alignment 
mismatches: tuples which fall onto block boundaries involve reading two blocks 
worth of data in order to decompress one tuple. Not a very frequent occurrence 
when the tuple size is small enough... but anyway.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-03-27 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239538#comment-13239538
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

bq. we have the row index_interval measured in rows, but for columns we have 
column_index_size_in_kb

Yes, I suppose we could switch the index_interval to the size_in_kb type, which 
would better limit the amount of data we may end up reading in the index now 
that row index entry may be of more widely different sizes. However, I don't 
see an easy way to merge those 2 settings into 1 if that was what you were 
hinting to.

bq. So, this isn't quite true

I really just meant that the committed code had no interaction with the 
compression code. You are right though that having compression block not 
aligned with data boundaries could mean that sometimes we'll have to decompress 
one more block that would be strictly necessary, but that's more a drawback of 
having compression blocks not being aligned to any data boundaries than 
anything related to this actual patch. Don't know how much that matters in 
practice though.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-11 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252058#comment-13252058
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

bq. I don't see an easy way to merge those 2 settings into 1 if that was what 
you were hinting to.

Yes, that's where I was going.

What if we dropped the "main" index and just kept the "sample" index of every 
1/128 columns?  Seems like we'd trade a little more seq i/o to do less random 
i/o, and being able to get rid of the index sampling phase on startup...

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-11 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252243#comment-13252243
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

bq. What if we dropped the "main" index and just kept the "sample" index of 
every 1/128 columns? Seems like we'd trade a little more seq i/o to do less 
random i/o, and being able to get rid of the index sampling phase on startup...

I am not sure I follow. 

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-15 Thread Stu Hood (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254376#comment-13254376
 ] 

Stu Hood commented on CASSANDRA-2319:
-

bq. What if we dropped the "main" index and just kept the "sample" index of 
every 1/128 columns?
This is what the original 674-v1 patch did, and it worked out fairly well. In 
the long run, it would be a win in terms of config and code complexity, but it 
will probably be about the same performance-wise.

>From 674-v1 (where ColumnKey is a full path to a column: (key, name1, name2, 
>name3, ...)
{code}
+/**
+ * An entry in the SSTable index file which points to a block in the data file.
+ * Each entry contains the full path of the column at the beginning of the 
block
+ * in the SSTable, and two file positions: the offset of the serialized version
+ * of this object in the index, and the offset of the block in the data file.
+ *
+ * To find a key in the data file, we first look at IndexEntries in memory, 
and find
+ * the last entry less than the key we want. We then seek to the position of 
that
+ * entry in the index file, read forward until we find the last entry less 
than the
+ * key, and then seek to the position of that entry's block in the data file.
+ */
+public class IndexEntry extends ColumnKey
{code}

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-16 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254707#comment-13254707
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

I still don't see how that would make us remove one of the 2 config option 
we're talking about. Even if you basically switch to an on-disk format where 
there is no notion of Row, you still have to (1) say how big are the block you 
index in the index file, and (2) how much of those index entries you load in 
memory (unless you decide to load the full index in memory, but I don't think 
that's what we're talking about). I.e. the ??raison d'ĂȘtre?? of both 
index_interval and column_index_size_in_kb is *not* because we have the notion 
of rows in the on-disk format.

Don't get me wrong, I'm not saying that having a file format where the row key 
is not special anymore is a bad idea at all (though I'll admit that I'm less 
convinced that moving C* to such a file format should be a priority), but it 
seems to me that it's a completely different debate.


> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-16 Thread Stu Hood (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255246#comment-13255246
 ] 

Stu Hood commented on CASSANDRA-2319:
-

bq. I.e. the raison d'ĂȘtre of both index_interval and column_index_size_in_kb 
is not because we have the notion of rows in the on-disk format.
If I'm understanding what Ellis is suggesting, it is that the entire sstable 
index could become sparse: that would mean that column_index_size_in_kb could 
be renamed to index_size_in_kb. index_interval would not change.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-16 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255359#comment-13255359
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

Sure I understand. What I'm saying is that the initial discussion was about 
removing a setting. But we can't. One of the setting is about how sparse is the 
main sstable index (it's already sparse in fact, we're not indexing every 
column, but every min(1 row, column_index_size_in_kb)), the other is how sparse 
is the in-memory summary of the on-disk sparse index.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-17 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255777#comment-13255777
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

bq. One of the setting is about how sparse is the main sstable index ... the 
other is how sparse is the in-memory summary of the on-disk sparse index.

The problem is that one only applies to row keys, and the other only to column 
names, which makes tuning complicated.

Or did that change, leaving the setting names obsolete?

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-17 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255825#comment-13255825
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

bq. which makes tuning complicated

No, you're right, that part didn't change. However a very easy fix would be to 
turn as I said above index_interval into index_interval_in_kb. Without deeper 
changes, the index interval would have to include at least one key (i.e. the 
index of one key), but aside from that it would give pretty much what we want 
in term of making tuning work as well for narrow and wide row. And that's a 
very simple change. Which doesn't preclude doing deeper changes later if we so 
wish, but in the meantime would give use 91.32% (my best guess) of the 
benefits. We could even rename the settings to say disk_index_interval_in_kb 
and memory_index_interval_in_kb (and internally both would really mean that the 
interval is min({disk,memory}_index_interval_in_kb, 1 row)).

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-05-02 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266908#comment-13266908
 ] 

Jonathan Ellis commented on CASSANDRA-2319:
---

does {{column_index_size_in_kb}} actually affect row keys as well as column 
names, or are those still special-cased to always be present in the (on-disk) 
index?

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-05-03 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267321#comment-13267321
 ] 

Sylvain Lebresne commented on CASSANDRA-2319:
-

bq. does column_index_size_in_kb actually affect row keys

No, it doesn't. In fact this patch didn't change anything on what composes the 
indexes and in particular the setting have the same meaning. The only 
difference is that the column indexes are in the on-disk index file instead of 
the on-disk data file.

> Promote row index
> -
>
> Key: CASSANDRA-2319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stu Hood
>Assignee: Sylvain Lebresne
>  Labels: index, timeseries
> Fix For: 1.2
>
> Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
> version-g-lzf.txt, version-g.txt
>
>
> The row index contains entries for configurably sized blocks of a wide row. 
> For a row of appreciable size, the row index ends up directing the third seek 
> (1. index, 2. row index, 3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains 
> information that tells us whether or not the 3rd seek is necessary (the 
> column range or name we are trying to slice may not exist in a given 
> sstable), promoting the row index into the sstable index would allow us to 
> drop the maximum number of seeks for wide rows back to 2, and, more 
> importantly, would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data 
> in wide rows, where data is appended to the beginning or end of the row. Our 
> existing compaction strategy gets lucky and clusters the oldest data in the 
> oldest sstables: for queries to recently appended data, we would be able to 
> eliminate wide rows using only the sstable index, rather than needing to seek 
> into the data file to determine that it isn't interesting. For narrow rows, 
> this change would have no effect, as they will not reach the threshold for 
> indexing anyway.
> A first cut design for this change would look very similar to the file format 
> design proposed on #674: 
> http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
> column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira