[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-11-24 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223036#comment-14223036
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

I have recently realized that there may be relatively cheap (operationally and 
development-wise) workaround for that limitation. It would also partially 
address the problem with bootstrapping new node. The root cause of all this is 
a large amount of data in a single CF on a single node when using LCS for that 
CF. The performance of a single compaction task running on a single thread is 
limited anyway. One of the obvious ways to break this limitation is to shard 
the data across multiple clones of that CF at the application level. 
Something as dumb as row key hash mod X and add this suffix to the CF name. In 
my case looks like having X=4 would be more than enough to solve the problem.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-11-12 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208154#comment-14208154
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

I had to rebuild one of the nodes in that test cluster. After bootstrapping it  
I have checked the results - I had over 6,5K pending compactions and many large 
sstables (between few Gb and 40-60Gb). I knew that under traffic this will 
*never* return to reasonable number of pending compactions.

I have decided to give it another try, enable the option from CASSANDRA-6621 
and re-bootstrap. This time I did not end up with huge sstables but, I think, 
it will also never recover. This is, essentially, what the node does most of 
the time:

{code}
pending tasks: 7217
  compaction typekeyspace   table   completed   
total  unit  progress
   Compaction  myks mytable1  5434997373 
10667184206 bytes50.95%
   Compaction  myksmytable2  1080506914  
7466286503 bytes14.47%
Active compaction remaining time :   0h00m09s
{code}

while:

{code}
# nodetool cfstats myks.mytable1
Keyspace: myks
Read Count: 49783
Read Latency: 38.612470602414476 ms.
Write Count: 521971
Write Latency: 1.3617571608384373 ms.
Pending Tasks: 0
Table: mytable1
SSTable count: 7893
SSTables in each level: [7828/4, 10, 56, 0, 0, 0, 0, 0, 0]
Space used (live), bytes: 1181508730955
Space used (total), bytes: 1181509085659
SSTable Compression Ratio: 0.3068450302663634
Number of keys (estimate): 28180352
Memtable cell count: 153554
Memtable data size, bytes: 41190431
Memtable switch count: 178
Local read count: 49826
Local read latency: 38.886 ms
Local write count: 522464
Local write latency: 1.392 ms
Pending tasks: 0
Bloom filter false positives: 11802553
Bloom filter false ratio: 0.98767
Bloom filter space used, bytes: 17686928
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 3379391
Compacted partition mean bytes: 142171
Average live cells per slice (last five minutes): 537.5
Average tombstones per slice (last five minutes): 0.0
{code}

By the way, this is the picture from another node that functions normally:

{code}
# nodetool cfstats myks.mytable1
Keyspace: myks
Read Count: 4638154
Read Latency: 20.784106776316612 ms.
Write Count: 15067667
Write Latency: 1.7291775639188205 ms.
Pending Tasks: 0
Table: mytable1
SSTable count: 4561
SSTables in each level: [37/4, 15/10, 106/100, 1053/1000, 3350, 
0, 0, 0, 0]
Space used (live), bytes: 1129716897255
Space used (total), bytes: 1129752918759
SSTable Compression Ratio: 0.33488717551698993
Number of keys (estimate): 25036672
Memtable cell count: 334212
Memtable data size, bytes: 115610737
Memtable switch count: 4476
Local read count: 4638155
Local read latency: 20.784 ms
Local write count: 15067679
Local write latency: 1.729 ms
Pending tasks: 0
Bloom filter false positives: 104377
Bloom filter false ratio: 0.59542
Bloom filter space used, bytes: 20319608
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 3379391
Compacted partition mean bytes: 152368
Average live cells per slice (last five minutes): 529.5
Average tombstones per slice (last five minutes): 0.0
{code}

So, not only the streaming has created an excessive amount of sstables, the 
compactions are not advancing at all. In fact, the number of pending 
compactions grows slowly on that (first) node. New L0 sstables get added 
because the write activity is taking place.

Just a simple math. If I take the compaction throughput of the node when it 
uses only one thread and compare it to my write rate I think the latter is like 
4x the former. Under this conditions this node will never recover - while 
having plenty of resources and very fast I/O.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-11-08 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203751#comment-14203751
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Here is another extreme (but, unfortunately, real) example of LCS going a bit 
crazy.

{code}
# nodetool cfstats myks.mytable
Keyspace: myks
Read Count: 3006212
Read Latency: 21.02595119106703 ms.
Write Count: 11226340
Write Latency: 1.8405579886231844 ms.
Pending Tasks: 0
Table: wm_contacts
SSTable count: 6530
SSTables in each level: [2369/4, 10, 104/100, 1043/1000, 3004, 
0, 0, 0, 0]
Space used (live), bytes: 1113384288740
Space used (total), bytes: 1113406795020
SSTable Compression Ratio: 0.3307170610260717
Number of keys (estimate): 26294144
Memtable cell count: 782994
Memtable data size, bytes: 213472460
Memtable switch count: 3493
Local read count: 3006239
Local read latency: 21.026 ms
Local write count: 11226517
Local write latency: 1.841 ms
Pending tasks: 0
Bloom filter false positives: 41835779
Bloom filter false ratio: 0.97500
Bloom filter space used, bytes: 19666944
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 3379391
Compacted partition mean bytes: 139451
Average live cells per slice (last five minutes): 444.0
Average tombstones per slice (last five minutes): 0.0
{code}

{code}
# nodetool compactionstats
pending tasks: 190
  compaction typekeyspace   table   completed   
total  unit  progress
   Compaction  myksmytable2  7198353690  
7446734394 bytes96.66%
   Compaction  myks mytable 4851429651 10717052513  
   bytes45.27%
Active compaction remaining time :   0h00m04s
{code}


Note the cfstats. The number of sstables at L0 is insane. Yet, C* is sitting 
quietly compacting the data using 2 cores out of 32.

Once it gets into this state I immediately start seeing large sstables forming  
- instead of 256Mb the sstables of 1-2Gb and more start appearing. And it 
creates the snowball effect.



 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-10-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176628#comment-14176628
 ] 

Marcus Eriksson commented on CASSANDRA-7949:


The first comment has a link to a github branch.

But, it is against trunk so don't use it in production (of course)

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-10-20 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176884#comment-14176884
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Then I doubt I can really try it. We are quite close from production deployment 
and trying with something that far from what we will use in prod is pointless 
(for me, not for the fix ;) ).

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-10-19 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176426#comment-14176426
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

[~krummas]

Marcus,

Which patch are you talking about? I am running latest DSE with Cassandra 
2.0.10.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-10-16 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174702#comment-14174702
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Update:

Using the property from CASSANDRA-6621 does help to get out of this state. My 
cluster is slowly digesting the large sstables and creating bunch of nice small 
sstables from them. It is slower than using sstablesplit, I believe, because it 
actually does real compactions and, thus, processes and reprocesses different 
sets of sstables. My understanding is that every time I get new bunch of L0 
sstables there is a phase for updating other levels and it repeats and repeats.

With that property set I see that my total number of sstables grows, my number 
of huge sstables decreases and the average size of the sstable decreases as 
result.

My conclusions so far:

1. STCS fallback in LCS is a double-edged sword. It is needed to prevent the 
flooding the node with tons of small sstables resulting from ongoing writes. 
These small ones are often much smaller than the configured target size and hey 
need to be merged. But also the use of STCS results in generation of the 
super-sized sstables. These become a large headache when the fallback stops and 
LCS is supposed to resume normal operations.  It appears to me (my humble 
opinion) that fallback should be done to some kind of specialized rescue STCS 
flavor that merges the small sstables to approximately the LCS target sstable 
size BUT DOES NOT create sstables that are much larger than the target size. 
With this approach the LCS will resume normal operations much faster than the 
cause for the fallback (abnormally high write load) is gone.

2. LCS has major (performance?) issue when you have super-large sstables in the 
system. It often gets stuck with single long (many hours) compaction stream 
that, by itself, will increase the probability of another STCS fallback even 
with reasonable write load. As a possible workaround I was recommended to 
consider running multiple C* instances on our relatively powerful machines - to 
significantly reduce the amount of data per node and increase compaction 
throughput.

3. In the existing systems, depending on the severity of the STCS fallback 
work the fix from CASSANDRA-6621 may help to recover while keeping the nodes 
up. It will take a very long time to recover but the nodes will be online.

4. Recovery (see above) is very long. It is much much longer than the duration 
of the stress period that causes the condition. In my case I was writing like 
crazy for about 4 days and it's been over a week of compactions after that. I 
am still very far from 0 pending compactions. Considering this it makes sense 
to artificially throttle the write speed when generating the data (like in the 
use case I described in previous comments). Extra time spent on writing the 
data will be still significantly  shorter than the amount of time required to 
recover from the consequences of abusing the available write bandwidth.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-10-12 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168822#comment-14168822
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

I did another round of testing and I can confirm my previous suspicion. If LCS 
goes into STCS fallback mode there seems to be some kind of point of no 
return. After loading fairly large amount of data I end up with a number of 
large (from few Gb to 200+Gb) sstables. After that the cluster simply goes 
downhill - it never recovers. Even if there is no traffic except the repair 
service (DSE OpsCenter) the number of pending compactions never declines. It 
actually grows. Sstables also grow and grow in size until the moment one of the 
compactions runs out of disk space and crashes the node.

Also I believe once in this state there is no way out. sstablesplit tool, as 
far as I understand, cannot be used with the live node. And the tool splits the 
data in single thread. I have measured its performance on my system, it 
processes about 13Mb/s on average, thus, to split all these large sstables it 
would take many DAYS.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-10-07 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162263#comment-14162263
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

It seems that what I am suffering from in this specific test is similar to 
CASSANDRA-6621. When I write all unique data to create my initial snapshot I 
effectively do something similar to what happens when new node is bootstrapped, 
I think.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-29 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152569#comment-14152569
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Upgraded to Cassandra 2.0.10 (via DSE 4.5.2) today. Switched my tables that 
used STCS to LCS. Restarted. For last 8 hours I observe this on all nodes:

{code}
# nodetool compactionstats
pending tasks: 13808
  compaction typekeyspace   table   completed   
total  unit  progress
   Compaction  mykeyspacetable_1528230773591   
1616185183262 bytes32.68%
   Compaction  mykeyspace table_2456361916088   
4158821946280 bytes10.97%
Active compaction remaining time :   3h57m56s
{code}

In the beginning of these 8 hours the remaining time was about 4h08m. CPU 
activity - almost nothing (between 2 and 3 cores), disk I/O - nearly zero. So 
clearly it compacts in one thread per keyspace and almost does not progress.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-24 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146632#comment-14146632
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

May be this is not related but I have another small cluster with similar data. 
I have just upgraded that one to 2.0.10 (not DSE, original open-source 
version). On all machines in this cluster I have many thousands of sstables, 
all 160Mb, few ones that are smaller. So they are all L0, no L1 or higher level 
sstables exist. LCS is used. Number of pending compactions: 0. There is even 
incoming traffic that writes into that keyspace. nodetool compact returns 
immediately.



 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143118#comment-14143118
 ] 

Marcus Eriksson commented on CASSANDRA-7949:


i think this could be fixed by this: CASSANDRA-7745

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-22 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143214#comment-14143214
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Update: I have completed my last data writing test, now I have enough data to 
start another phase. I did that last test with compaction strategy set to STCS 
but disabled for the duration of the test. Once all writers have finished I 
have re-enabled the compactions. In under one day STCS has completed the job on 
all nodes, I ended up with few dozens (~40 or so) large sstables, total amount 
of data about 23Tb on 15 nodes.

I have switched back to LCS this morning and immediately observed the hockey 
stick on the pending compaction graph. Now each node reports about 8-10K of 
pending compactions, they are all compacting in one stream per CF and consume 
virtually no resources:

{code}
# nodetool compactionstats
pending tasks: 9900
  compaction typekeyspace   table   completed   
total  unit  progress
   Compaction  testks test_list2 26630083587
812539331642 bytes 3.28%
   Compaction  testks test_list1 24071738534   
1994877844635 bytes 1.21%
Active compaction remaining time :   2h16m55s


# w
 13:41:45 up 23 days, 18:13,  2 users,  load average: 1.81, 2.13, 2.51
...


# iostat -mdx 5
Linux 3.8.13-44.el6uek.x86_64 (cassandra01.mydomain.com)  22/09/14
_x86_64_(32 CPU)

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sdb   0.00 5.73   88.00   13.33 5.47 5.16   214.84 
0.515.08   0.39   3.98
sda   0.00 8.160.13   65.80 0.00 3.28   101.80 
0.060.87   0.11   0.71
sdc   0.00 4.93   75.05   13.34 4.67 5.42   233.62 
0.495.55   0.39   3.42
sdd   0.00 5.82   86.40   14.10 5.37 5.52   221.83 
0.565.59   0.38   3.81

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sdb   0.00 0.00  134.600.00 8.37 0.00   127.30 
0.060.42   0.42   5.64
sda   0.0013.000.00  220.40 0.00 0.96 8.94 
0.010.05   0.01   0.32
sdc   0.00 0.00   36.400.00 2.27 0.00   128.00 
0.010.41   0.41   1.50
sdd   0.00 0.00   21.200.00 1.32 0.00   128.00 
0.000.19   0.19   0.40
{code}

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143357#comment-14143357
 ] 

Marcus Eriksson commented on CASSANDRA-7949:


so, if you switch to STCS and let it compact, you are bound to do a lot of L0 
to L1 compaction in the beginning since all sstables are in level 0 and need to 
pass through L1 before making it to the higher levels.

L0 to L1 compactions usually include _all_ L1 sstables, this means that only 
one can proceed at a time.

Looking at your compactionstats, you have one 2TB compaction going on, probably 
between L0 and L1, that needs to finish before it can continue doing higher 
level compactions

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-22 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143365#comment-14143365
 ] 

Benedict commented on CASSANDRA-7949:
-

That sounds like fairly suboptimal behaviour still. But it sounds like 
CASSANDRA-6696 should help to address it. 

When there is some time, we should also reintroduce a more functional 
multi-threaded compaction. It should be quite achievable to build one that is 
correct, safe and faster for these scenarios.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-19 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141080#comment-14141080
 ] 

Philip Thompson commented on CASSANDRA-7949:


Did switching to STCS end up solving your issue?

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-19 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141087#comment-14141087
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Yes and no. Yes - the number of pending compactions started to go down and I 
ended up with fewer (and large) sstables. But I think the issue is more about 
LCS compaction performance. Is it normal that LCS cannot efficiently use the 
host resources while having tons of pending compactions?

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141140#comment-14141140
 ] 

Benedict commented on CASSANDRA-7949:
-

I must admit, the behaviour does sound very suspicious to me. However looking 
at the logs it appears to me that the problem may be related to PendingTasks' 
being an _estimate_, because the compaction manager repeatedly looks for work 
to do and says nope, nothing. Is it possible you were generating sstables 
that were contiguous and non-overlapping, so that there were no candidates for 
compaction? 

LCS is not my area of expertise, but [~krummas] should certainly have a look. 
Estimating ~3k compaction tasks and finding _none_ (repeatedly) seems off to 
me. At the very least we should fix the estimate, but I suspect something else 
may be happening.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-19 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141154#comment-14141154
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

I understand that it was an estimate but my cluster was trying to process 
this estimate in almost 3 full days with little progress. About 1,5 days of 
data injection3 days of compaction with no progress - that does not sound 
right. And STCS was able to crunch most of the data in about one day after the 
switch.

I strongly suspect that the fact that I was loading and not updating the data 
at high rate resulted in some sort of edge case scenario for LCS. But 
considering that the cluster could not recover in reasonable amount of time 
(exceeding the original load time by factor of 2+) I do believe that something 
may need to be improved in LCS logic OR some kind of diagnostic message needs 
to be generated to request a specific action to be taken by the cluster owner. 
In my case the problem was easy to spot as it was highly visible - but if this 
happens to one of 50 CFs it may take a while before someone spots endless 
compactions happening.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141167#comment-14141167
 ] 

Benedict commented on CASSANDRA-7949:
-

Well, my point is that the server _thought there was no work to do_. The only 
logical explanation for that is that it was either wrong, or the data was 
generated in a manner that actually _didn't_ need compacting as much as the 
estimate thought (and the estimate was wrong). The estimate being _so_ out of 
whack seems unlikely to me, but it is a possibility.

It's worth mentioning there definitely _were_ compactions happening, just not 
very many. It's possible these in progress compactions were preventing other 
compactions that should have been happening from taking place, since they 
cannot participate in other compactions once they're already involved in some. 
This _does_ seem like something we need to investigate and understand 
thoroughly, whatever it turns out to be.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-19 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141174#comment-14141174
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Just a small clarification: just not very many is not exactly what I 
observed. Mainly there was one active compaction but once in a while there was 
a burst of compactions with high CPU usage, GOSSIP issues caused by nodes being 
less responsive etc.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-19 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141204#comment-14141204
 ] 

Jeremiah Jordan commented on CASSANDRA-7949:


[~benedict] I don't think there were ever *no* compactions.  Just periods where 
there was one compaction going on that was blocking any concurrent compactions 
from happening.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141289#comment-14141289
 ] 

Benedict commented on CASSANDRA-7949:
-

bq. Benedict I don't think there were ever no compactions

I stated that there were compactions happening.

But there seems something wrong when you have 3k estimated compactions and only 
1 taking place.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-17 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137054#comment-14137054
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

I see. I could try to switch to STCS now and see what happens.

My concern is that the issue seems to be permanent. Even after last night none 
of the nodes (being vritually idle - the load was over) was able to eat through 
the pending compactions. And, to my surprise, half of the nodes in the cluster 
do not even compact fast enough - look at the graphs attached.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-17 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138137#comment-14138137
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

Just an update: I have switched to STCS early this morning and by now half of 
the nodes are getting close to zero pending transactions. Half of remaining 
nodes seem to be behind but they are compacting at full speed (smoke coming 
from the lab ;) ) and I see the number of pending compactions going down on 
them as well. On the nodes where compactions are almost over the number of 
sstables is now very small, less than a hundred.


 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger 

[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-16 Thread Nikolai Grigoriev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136549#comment-14136549
 ] 

Nikolai Grigoriev commented on CASSANDRA-7949:
--

system log already includes the logs from 
log4j.logger.org.apache.cassandra.db.compaction (except 
log4j.logger.org.apache.cassandra.db.compaction.ParallelCompactionIterable)

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name *wm_contacts*Data* | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-16 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136712#comment-14136712
 ] 

Jeremiah Jordan commented on CASSANDRA-7949:


For the initial load you probably want to disable STCS in L0. CASSANDRA-6621. 
Or maybe use STCS and then switch to LCS when the load is over.

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending compactions. About half of the cluster, 
 in fact. But even there I see they are sitting idle most of the time lazily 
 compacting in one stream with CPU at ~140% and occasionally doing the bursts 
 of compaction work for few minutes.
 I am wondering if this is really a bug or something in the LCS logic that 
 would manifest itself only in such an edge case scenario where I have loaded 
 lots of unique data quickly.
 By the way, I see this pattern only for one of two tables - the one that has 
 about 4 times more data than another (space-wise, number of rows is the 
 same). Looks like all these pending compactions are really only for that 
 larger table.
 I'll be attaching the relevant logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)