[ https://issues.apache.org/jira/browse/CASSANALYTICS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17979310#comment-17979310 ]
Doug Rohrer commented on CASSANALYTICS-36: ------------------------------------------ +1 after review/updates. Thanks for the patch! > Bulk Reader should dynamically size the Spark job based on estimated table > size > ------------------------------------------------------------------------------- > > Key: CASSANALYTICS-36 > URL: https://issues.apache.org/jira/browse/CASSANALYTICS-36 > Project: Apache Cassandra Analytics > Issue Type: New Feature > Components: Reader > Reporter: Doug Rohrer > Assignee: Francisco Guerrero > Priority: Normal > Fix For: 1.0 > > Time Spent: 10m > Remaining Estimate: 0h > > When reading a smaller dataset, leveraging a large number of Spark cores is > actually less efficient than using a smaller number. By using estimated table > size provided by Cassandra (similar to the data provided by `nodetool > tablestats`) we can do a better job of limiting resource utilization and > decreasing job runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org