[jira] [Commented] (CASSANDRA-9259) Bulk Reading from Cassandra

Stefania (JIRA) Wed, 06 Apr 2016 02:57:43 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228054#comment-15228054
 ]


Stefania commented on CASSANDRA-9259:
-------------------------------------

Here are the results of a proof of concept of an *optimized read path* for 
local reads at CL 1 and *streaming*.

These are the results in *ops/second*:

||Partition Size||Page size||Num. Partitions in the table||Synchronous, no 
optimization ops/s||Synchronous, with optimization ops/s||Prefetch a page, no 
optimization ops/s||Prefetch a page, with optimization ops/s||Streaming, 
measurement 1 ops/s||Streaming, measurement 2 ops/s||Improvement due to local 
read optimization||Improvement due to streaming||Total improvement||Link to the 
test||
|100 
KBYTES|10|250K|98|148|189|283|784|767|49.74%|174.03%|310.32%|[link|http://cstar.datastax.com/tests/id/6dc252e8-fb0c-11e5-9ad6-0256e416528f]|
|10 
KBYTES|100|1M|93|138|174|259|659|673|48.85%|157.14%|282.76%|[link|http://cstar.datastax.com/tests/id/d8002fe2-fad3-11e5-a500-0256e416528f]|
|1 
KBYTE|1000|1M|84|133|114|179|233|239|57.02%|31.84%|107.02%|[link|http://cstar.datastax.com/tests/id/8dd1c5ba-fad9-11e5-82e5-0256e416528f]|
|1 
KBYTE|1000|2M|60|98|94|153|247|248|62.77%|61.76%|163.30%|[link|http://cstar.datastax.com/tests/id/97be0520-fba6-11e5-bba8-0256e416528f]|
|100 
BYTES|10000|5M|21|33|24|37|41|44|54.17%|14.86%|77.08%|[link|http://cstar.datastax.com/tests/id/95e31c0c-fb12-11e5-9ad6-0256e416528f]|
|50 
BYTES|20000|5M|20|32|20|33|35|37|65.00%|9.09%|80.00%|[link|http://cstar.datastax.com/tests/id/bf1a52a0-fb8c-11e5-838f-0256e416528f]|
|10 
KBYTES|5000|500K|31|46|31|46|45|46|48.39%|-1.09%|46.77%|[link|http://cstar.datastax.com/tests/id/df9e6e96-fbbc-11e5-bf65-0256e416528f]|
|1 
KBYTE|5000|2M|35|56|40|64|66|66|60.00%|3.13%|65.00%|[link|http://cstar.datastax.com/tests/id/1d1785fc-fbb6-11e5-bf65-0256e416528f]|
|100 
BYTES|5000|5M|22|40|31|53|66|66|70.97%|24.53%|112.90%|[link|http://cstar.datastax.com/tests/id/70262d52-fbac-11e5-a876-0256e416528f]|


These are the same results but expressed in *rows/second*:

||Partition Size||Page size||Num. Partitions in the table||Synchronous, no 
optimization rows/s||Synchronous, with optimization rows/s||Prefetch a page, no 
optimization rows/s||Prefetch a page, with optimization rows/s||Streaming, 
measurement 1 rows/s||Streaming, measurement 2 rows/s||Improvement due to local 
read optimization||Improvement due to streaming||Total improvement||Link to the 
test||
|100 
KBYTES|10|250K|963|1453|1849|2761|7702|7522|49.32%|175.70%|311.68%|[link|http://cstar.datastax.com/tests/id/6dc252e8-fb0c-11e5-9ad6-0256e416528f]|
|10 
KBYTES|100|1M|8830|13159|16572|24591|62649|63975|48.39%|157.46%|282.04%|[link|http://cstar.datastax.com/tests/id/d8002fe2-fad3-11e5-a500-0256e416528f]|
|1 
KBYTE|1000|1M|52543|83548|71277|112558|145637|150070|57.92%|31.36%|107.44%|[link|http://cstar.datastax.com/tests/id/8dd1c5ba-fad9-11e5-82e5-0256e416528f]|
|1 
KBYTE|1000|2M|47029|76292|74131|119727|193738|193495|61.51%|61.71%|161.18%|[link|http://cstar.datastax.com/tests/id/97be0520-fba6-11e5-bba8-0256e416528f]|
|100 
BYTES|10000|5M|88130|142800|100590|158780|176699|185292|57.85%|13.99%|79.93%|[link|http://cstar.datastax.com/tests/id/95e31c0c-fb12-11e5-9ad6-0256e416528f]|
|50 
BYTES|20000|5M|94581|153296|97599|157463|169226|175581|61.34%|9.49%|76.64%|[link|http://cstar.datastax.com/tests/id/bf1a52a0-fb8c-11e5-838f-0256e416528f]|
|10 
KBYTES|5000|500K|14938|22356|15152|22623|22174|22419|49.31%|-1.44%|47.15%|[link|http://cstar.datastax.com/tests/id/df9e6e96-fbbc-11e5-bf65-0256e416528f]|
|1 
KBYTE|5000|2M|63552|100974|71644|115001|119537|119079|60.52%|3.75%|66.53%|[link|http://cstar.datastax.com/tests/id/1d1785fc-fbb6-11e5-bf65-0256e416528f]|
|100 
BYTES|5000|5M|70154|126547|98331|168121|208226|207482|70.97%|23.63%|111.38%|[link|http://cstar.datastax.com/tests/id/70262d52-fbac-11e5-a876-0256e416528f]|


The columns above refer to the following cassandra-stress operations:

*Synchronous page retrieval*
The client retrieves each page synchronously, with and without the optimized 
local read path.

*Asynchronous page retrieval (prefetch)*
The client retrieves the first page synchronously and then prefetches the next 
page, before processing the results of the previous page, with and without the 
optimized local read path.

*Streaming*
The client requests all pages initially and then waits synchronously for the 
first page. For the following pages, each operation processes a page that was 
previously delivered, blocking only if a page is unavailable. 

There are two equivalent measurements for streaming because the local read path 
optimization is always available; it would have added considerable extra work 
to implement streaming without optimized read path, and it would have only 
provided comparison data which is already available.

*Results*
The improvement due to local read optimization is calculated by comparing the 
asynchronous (prefetch) result, with and without optimization. The increase is 
between 50% and 70%.

The improvement due to streaming is calculated by comparing the average of the 
streaming measurements with the optimized prefetch results. It varies from 
negligible change to an additional increase of 175%, depending on page and row 
sizes. The reason for this variability is that each cassandra stress operation 
iterates over the rows in the page. The smaller the data to iterate over, the 
quicker the operation. It is therefore difficult to quantify the overall 
benefit of streaming as it depends on the speed at which the client can process 
results. However, because clients can always process results asynchronously, 
streaming should lead in most cases to a material improvement in overall 
processing times.

The total improvement is calculated by comparing the average of the streaming 
measurements with the unoptimized prefetch result (which is the best rate 
achievable at present, without considering any optimizations performed by this 
patch) and it follows from the two factors just discussed.


*Test environment*
The tests were performed on [cstart.datastax.com|http://cstar.datastax.com] 
using the "Taylor" cluster and the links are available in the tables above. The 
operation names displayed in the graphs could not be changed, so please use the 
following conversions when viewing results: {{1_user=insert, 2_user=streaming, 
3_user=prefetch, 4_user=synchronous retrieval}}

*Steps forward*
In view of these results, I believe an optimized local read path is worth 
pursuing whilst streaming should also provide benefits to asynchronous clients. 
Comments are welcome.



> Bulk Reading from Cassandra
> ---------------------------
>
>                 Key: CASSANDRA-9259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9259
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Compaction, CQL, Local Write-Read Paths, Streaming and 
> Messaging, Testing
>            Reporter:  Brian Hess
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 3.x
>
>         Attachments: bulk-read-benchmark.1.html, 
> bulk-read-jfr-profiles.1.tar.gz, bulk-read-jfr-profiles.2.tar.gz
>
>
> This ticket is following on from the 2015 NGCC.  This ticket is designed to 
> be a place for discussing and designing an approach to bulk reading.
> The goal is to have a bulk reading path for Cassandra.  That is, a path 
> optimized to grab a large portion of the data for a table (potentially all of 
> it).  This is a core element in the Spark integration with Cassandra, and the 
> speed at which Cassandra can deliver bulk data to Spark is limiting the 
> performance of Spark-plus-Cassandra operations.  This is especially of 
> importance as Cassandra will (likely) leverage Spark for internal operations 
> (for example CASSANDRA-8234).
> The core CQL to consider is the following:
> SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey) > X AND 
> Token(partitionKey) <= Y
> Here, we choose X and Y to be contained within one token range (perhaps 
> considering the primary range of a node without vnodes, for example).  This 
> query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk 
> operations via Spark (or other processing frameworks - ETL, etc).  There are 
> a few causes (e.g., inefficient paging).
> There are a few approaches that could be considered.  First, we consider a 
> new "Streaming Compaction" approach.  The key observation here is that a bulk 
> read from Cassandra is a lot like a major compaction, though instead of 
> outputting a new SSTable we would output CQL rows to a stream/socket/etc.  
> This would be similar to a CompactionTask, but would strip out some 
> unnecessary things in there (e.g., some of the indexing, etc). Predicates and 
> projections could also be encapsulated in this new "StreamingCompactionTask", 
> for example.
> Another approach would be an alternate storage format.  For example, we might 
> employ Parquet (just as an example) to store the same data as in the primary 
> Cassandra storage (aka SSTables).  This is akin to Global Indexes (an 
> alternate storage of the same data optimized for a particular query).  Then, 
> Cassandra can choose to leverage this alternate storage for particular CQL 
> queries (e.g., range scans).
> These are just 2 suggestions to get the conversation going.
> One thing to note is that it will be useful to have this storage segregated 
> by token range so that when you extract via these mechanisms you do not get 
> replications-factor numbers of copies of the data.  That will certainly be an 
> issue for some Spark operations (e.g., counting).  Thus, we will want 
> per-token-range storage (even for single disks), so this will likely leverage 
> CASSANDRA-6696 (though, we'll want to also consider the single disk case).
> It is also worth discussing what the success criteria is here.  It is 
> unlikely to be as fast as EDW or HDFS performance (though, that is still a 
> good goal), but being within some percentage of that performance should be 
> set as success.  For example, 2x as long as doing bulk operations on HDFS 
> with similar node count/size/etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9259) Bulk Reading from Cassandra

Reply via email to