[ https://issues.apache.org/jira/browse/CASSANDRA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pierre N. updated CASSANDRA-9323: --------------------------------- Description: When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I tested on a fresh cassandra node (nothing in keyspace, nor tables) with good hardware (8x2.8ghz, 32G ram), but with classic hard disk (performance won't be improved with SSD in this case I think). When I upload from a different server an sstable I get an average of 3 MB/sec, in the attached example I managed to get 5 MB/sec, which is still slow. During the streaming process I noticed that one core of the server is full CPU, so I think the operation is CPU bound server side. I quickly attached a sample profiler to the cassandra instance and got the following output : https://i.imgur.com/IfLc2Ip.png So, I think, but I may be wrong because it's inaccurate sampling, during streaming the table is unserialized and reserialized to another sstable, and that's this unserailize/serialize process which is taking a big amount of CPU, slowing down the insert speed. Can someone confirm the bulk load is slow ? I tested also on my computer and barely reach 1MB/sec I don't understand the point of totally unserializing the table I just did build using the CQLSStableWriter (because it's already a long process to build and sort the table), couldn't it just copy the table from offset X to offset Y (using index information by example) without unserializing/reserializing it ? was: Hi, When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I tested on a fresh cassandra node (nothing in keyspace, nor tables) with good hardware (8x2.8ghz, 32G ram), but with classic hard disk (performance won't be improved with SSD in this case I think). When I upload from a different server an sstable I get an average of 3 MB/sec, in the attached example I managed to get 5 MB/sec, which is still slow. During the streaming process I noticed that one core of the server is full CPU, so I think the operation is CPU bound server side. I quickly attached a sample profiler to the cassandra instance and got the following output : https://i.imgur.com/IfLc2Ip.png So, I think, but I may be wrong because it's inaccurate sampling, during streaming the table is unserialized and reserialized to another sstable, and that's this unserailize/serialize process which is taking a big amount of CPU, slowing down the insert speed. Can someone confirm the bulk load is slow ? I tested also on my computer and barely reach 1MB/sec I don't understand the point of totally unserializing the table I just did build using the CQLSStableWriter (because it's already a long process to build and sort the table), couldn't it just copy the table from offset X to offset Y (using index information by example) without unserializing/reserializing it ? > Bulk upload is slow > ------------------- > > Key: CASSANDRA-9323 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9323 > Project: Cassandra > Issue Type: Bug > Reporter: Pierre N. > Attachments: App.java > > > When I bulk upload sstable created with CQLSSTableWriter, it's very slow. I > tested on a fresh cassandra node (nothing in keyspace, nor tables) with good > hardware (8x2.8ghz, 32G ram), but with classic hard disk (performance won't > be improved with SSD in this case I think). > When I upload from a different server an sstable I get an average of 3 > MB/sec, in the attached example I managed to get 5 MB/sec, which is still > slow. > During the streaming process I noticed that one core of the server is full > CPU, so I think the operation is CPU bound server side. I quickly attached a > sample profiler to the cassandra instance and got the following output : > https://i.imgur.com/IfLc2Ip.png > So, I think, but I may be wrong because it's inaccurate sampling, during > streaming the table is unserialized and reserialized to another sstable, and > that's this unserailize/serialize process which is taking a big amount of > CPU, slowing down the insert speed. > Can someone confirm the bulk load is slow ? I tested also on my computer and > barely reach 1MB/sec > I don't understand the point of totally unserializing the table I just did > build using the CQLSStableWriter (because it's already a long process to > build and sort the table), couldn't it just copy the table from offset X to > offset Y (using index information by example) without > unserializing/reserializing it ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)