[ https://issues.apache.org/jira/browse/CASSANDRA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-3943: ---------------------------------------- Fix Version/s: (was: 1.1.0) Issue Type: Task (was: Improvement) > Too many small size sstables after loading data using sstableloader or > BulkOutputFormat increases compaction time. > ------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-3943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3943 > Project: Cassandra > Issue Type: Task > Components: Hadoop, Tools > Affects Versions: 0.8.2, 1.1.0 > Reporter: Samarth Gahire > Assignee: Brandon Williams > Priority: Minor > Labels: bulkloader, hadoop, sstableloader, streaming, tools > Original Estimate: 168h > Remaining Estimate: 168h > > When we create sstables using SimpleUnsortedWriter or BulkOutputFormat,the > size of sstables created is around the buffer size provided. > But After loading , sstables created in the cluster nodes are of size around > {code}( (sstable_size_before_loading) * replication_factor ) / > No_Of_Nodes_In_Cluster{code} > As the no of nodes in cluster goes increasing, size of each sstable loaded to > cassandra node decreases.Such small size sstables take too much time to > compact (minor compaction) as compare to relatively large size sstables. > One solution that we have tried is to increase the buffer size while > generating sstables.But as we increase the buffer size ,time taken to > generate sstables increases.Is there any solution to this in existing > versions or are you fixing this in future version? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira