[ https://issues.apache.org/jira/browse/CASSANDRA-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263561#comment-15263561 ]
Russell Alexander Spitzer commented on CASSANDRA-11542: ------------------------------------------------------- One other thing, when we were testing before we noticed that our Reads were not IO bound, with Streaming are we now IO bound on the C* side? IE is drive usage at full? > Create a benchmark to compare HDFS and Cassandra bulk read times > ---------------------------------------------------------------- > > Key: CASSANDRA-11542 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11542 > Project: Cassandra > Issue Type: Sub-task > Components: Testing > Reporter: Stefania > Assignee: Stefania > Fix For: 3.x > > Attachments: spark-load-perf-results-001.zip, > spark-load-perf-results-002.zip > > > I propose creating a benchmark for comparing Cassandra and HDFS bulk reading > performance. Simple Spark queries will be performed on data stored in HDFS or > Cassandra, and the entire duration will be measured. An example query would > be the max or min of a column or a count\(*\). > This benchmark should allow determining the impact of: > * partition size > * number of clustering columns > * number of value columns (cells) -- This message was sent by Atlassian JIRA (v6.3.4#6332)