[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204917#comment-15204917 ] DOAN DuyHai commented on CASSANDRA-11383: - bq. we probably not going to remove SPARSE but rather we are just going to fail index build if SPARSE is set but it's requirements are not met, so operators will be able to manually change the schema and trigger index rebuild I prefer this alternative. I believe there is a real need for {{SPARSE}} indices. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204881#comment-15204881 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] We are currently working on fixing the stitching step memory footprint, it kind of looks like we probably not going to remove SPARSE but rather we are just going to fail index build if SPARSE is set but it's requirements are not met, so operators will be able to manually change the schema and trigger index rebuild. Also it's not necessary to explicitly set mode at all, it will be PREFIX by default which works fine for both text and numeric fields. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204144#comment-15204144 ] DOAN DuyHai commented on CASSANDRA-11383: - [~xedin] Ok last update from testing: - LCS 1Gb max_sstable_size - only PREFIX index modes The cluster is running fine with index build. I can even build multiple indices at the same time If you decide to remove {{SPARSE}} mode, how will SASI deal with real *sparse* numerical values (like the index for {{created_at}} in the example ?) Or does SASI auto-detect sparse-ness and adapt its data structure ? > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203243#comment-15203243 ] DOAN DuyHai commented on CASSANDRA-11383: - bq. and numerical indexes are not required to be always marked as SPARSE I missed this. I thought that mode {{PREFIX}} is only reserved to {{text}} types > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203226#comment-15203226 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] It looks like to me that currently SPARSE creates more confusion than good, I'm going to remove that mode as part of this patch and will look into maybe having different index type for columns like timestamp if that proves to be a problem for people. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203216#comment-15203216 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] Yes I think I will be able to at least minimize memory requirement for the stitching stage that should be able to solve most of it and once again - "period_end_month_int" is *not* a SPARSE index and numerical indexes are not required to be always marked as SPARSE only in cases where each (or most) of the keys have unique value for such columns e.g. timestamp where each key is going to have milli-/micro-second value which is almost guaranteed to be unique for every given row. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203172#comment-15203172 ] DOAN DuyHai commented on CASSANDRA-11383: - [~xedin] Last update from the testing. I put the cluster in *ideal* conditions as you recommended. JVM settings: - CMS - Xmx8g, Xms8G C* settings: - concurrent_compactors: 6 Test conditions: - cluster *idle* (no write, no read) - LCS with *sstable_size_in_mb* = 1024 (1Gb) - *no compaction ongoing* (took a whole night to compact for LCS) - {{CREATE CUSTOM INDEX resource_period_end_month_int_idx ON sharon.resource_bench (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = \{'mode': 'SPARSE'\};}} Observations: - I/O idle, CPU not exceeding 20% on average (http://postimg.org/image/f664wm8dp/) - {{nodetool compactionstats}} only show 1 index rebuild ongoing per node {noformat} id compaction type keyspace table completed total unit progress d8b4f4b0-ee6a-11e5-81f5-bd5584064785 Secondary index build sharon resource_bench 9535985318 18920482745 bytes 50.40% id compaction type keyspace table completed total unit progress d8b65440-ee6a-11e5-b44b-4deeb5ac98a3 Secondary index build sharon resource_bench 9464081317 20988668046 bytes 45.09% id compaction type keyspace table completed total unit progress d8b3bc30-ee6a-11e5-a152-db40f4fbe6b8 Secondary index build sharon resource_bench 9471325678 17061191471 bytes 55.51% id compaction type keyspace table completed total unit progress d8b45870-ee6a-11e5-b26b-53ed13e9667e Secondary index build sharon resource_bench 9120598050 18921737677 bytes 48.20% id compaction type keyspace table completed total unit progress d8b45870-ee6a-11e5-b2a3-331c04173c53 Secondary index build sharon resource_bench 8943568835 20591008789 bytes 43.43% id compaction type keyspace table completed total unit progress d8b47f80-ee6a-11e5-9fc8-0597212274c1 Secondary index build sharon resource_bench 10172038156 21422242706 bytes 47.48% id compaction type keyspace table completed total unit progress d8b34700-ee6a-11e5-a642-6dee841e75e5 Secondary index build sharon resource_bench 10161205385 18730171060 bytes 54.25% id compaction type keyspace table completed total unit progress d8b6f080-ee6a-11e5-8da4-bd70732fdab1 Secondary index build sharon resource_bench 9961529350 21294352899 bytes 46.78% id compaction type keyspace table completed total unit progress d8b43160-ee6a-11e5-8ac9-f59d626eedfa Secondary index build sharon resource_bench 9160286080 22153527929 bytes 41.35% id compaction type keyspace table completed total unit progress d8b51bc0-ee6a-11e5-8aa0-b9e611280aba Secondary index build sharon resource_bench 9397690505 22791700212 bytes 41.23% id compaction type keyspace table completed total unit progress d8b542d0-ee6a-11e5-8521-fbd14b018db6 Secondary index build sharon resource_bench 10029096174 18910334578 bytes 53.04% id compaction type keyspace table completed total unit progress d8b40a50-ee6a-11e5-a7b2-4b114ced0935 Secondary index build sharon resource_bench 10118551269 16938426769 bytes 59.74% id compaction type keyspace table completed total unit progress d8b039c0-ee6a-11e5-9a98-ff9a6f2af762 Secondary index build sharon resource_bench 9003236945 18252472495 bytes 49.33% {noformat} - *there are still A LOT of GC* {noformat} INFO [Service Thread] 2016-03-20 09:46:44,695 GCInspector.java:284 - ParNew GC in 455ms. CMS Old Gen: 2964960608 -> 3487392640; Par Eden Space: 1006632960 -> 0; INFO [Service Thread] 2016-03-20 09:46:47,250 GCInspector.java:284 - ParNew GC in 460ms. CMS Old Gen: 3487392640 -> 3990379824; Par Eden Space: 1006632960 -> 0; Par Survivor Space: 125829120 -> 125828160 INFO [Service Thread] 2016-03-20 09:46:49,452 GCInspector.java:284 - ParNew GC in 414ms. CMS Old Gen: 3990379824 -> 4445691424; Par Eden Space: 1006632960 -> 0; Par Survivor Space: 125828160 -> 125827840 INFO [Service Thread] 2016-03-20 09:46:52,328 GCInspector.java:284 - ParNew GC in 484ms. CMS Old Gen: 4445691424 -> 4968532112; Par Eden Space: 1006632960 -> 0; Par Survivor Space: 125827840 -> 12582744
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202223#comment-15202223 ] DOAN DuyHai commented on CASSANDRA-11383: - Ok 1. drop all indices 2. clean all *idx* files on all machines 3. {{CREATE CUSTOM INDEX resource_period_end_month_int_idx ON sharon.resource_bench (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = \{'mode': 'SPARSE', 'max_compaction_flush_memory_in_mb': '128'\};}} Now let it just run for the night and let's see tomorrow morning > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202094#comment-15202094 ] DOAN DuyHai commented on CASSANDRA-11383: - [~xedin] I dropped all the indices and recreated them one by one but with no avail, it eventually OOM after a while, see the second log file attached with CMS settings. The node was building only 1 index > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202084#comment-15202084 ] Pavel Yaskevich commented on CASSANDRA-11383: - Ok, so it's going to be 1G * 9 per memtable and memtable itself is pretty big and I am assuming that all of the 9 columns there are only columns you have defined in the row, which means that a couple of flushed can potentially consume your while heap. You can try minimizing the memtable size to make it flush more frequently and let compaction deal with it, I'm attaching the patch to make memtable flush segment sizes configurable (and default of 128 MB) (you can specify max_memtable_flush_memory_in_mb in index options if you want to make it even smaller). > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202141#comment-15202141 ] Jack Krupansky commented on CASSANDRA-11383: bq. recreated them one by one but with no avail, it eventually OOM after a while But are you waiting for each to finish its build before proceeding to the next? I mean, can even one index alone complete a build? Or, can you create the first 2 or 3 and let them run in parallel to completion before proceeding to the next. Maybe there is some practical limit to how many indexes you can build in parallel before the rate of garbage generation exceeds the rate of GC with all of this going on in parallel. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202080#comment-15202080 ] Jack Krupansky commented on CASSANDRA-11383: 1. How large are each of the text fields being indexed? Are they fairly short or are some quite long (and not tokenized, either)? I'm wondering if maybe a wide column is causing difficulty. 2. Does OOM occur if SASI indexes are created one at a time - serially, waiting for full index to build before moving on to the next? 3. Do you need a 32G heap to build just one index? I cringe when I see a heap larger than 14G. See if you can get a single SASI index build to work in 10-12G or less. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202951#comment-15202951 ] DOAN DuyHai commented on CASSANDRA-11383: - Ok it's clear now thanks for the clarification [~xedin] > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202042#comment-15202042 ] DOAN DuyHai commented on CASSANDRA-11383: - - memtable_heap_space_in_mb: 2048 - memtable_offheap_space_in_mb: 2048 > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202934#comment-15202934 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] Let me first elaborate what I mean by "it's not sparse" - SPARSE meant to be used when there are a lot index *values* and each of the values has *less than 5 keys* so it's *SPARSE*ly found in the index. SPARSE have to do more with keys/tokens than values, that's why example uses "created_at" since that would have a lot of values and each of the values would, most likely, only have a single token/key attached to it. We actually detect this situation and actual index is going to be constructed correctly even if SPARSE mode was set on not sparse column. Regarding LCS - it's LeveledCompactionStrategy where you can set maximum sstable size, I would suggest you make it something like 1G or less because stitching and OOM you see is directly related to the size of sstable file. Meanwhile I working on the fix for current situation. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202072#comment-15202072 ] DOAN DuyHai commented on CASSANDRA-11383: - If someone want the routine to generate the dataset, I have everything available. I used co-located Spark for the job of inserting randomized massive data > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202134#comment-15202134 ] Pavel Yaskevich commented on CASSANDRA-11383: - Btw, how big is ma-1831-big-SI sstable itself and how big are the index components it flushed? It's pretty weird to see 82 segments flushed. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202101#comment-15202101 ] DOAN DuyHai commented on CASSANDRA-11383: - By the way, I can use the hardware for the whole weekend so if your guys have ideas to test (drop index, re-create with different index settings or C* settings) just tell me > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202876#comment-15202876 ] Jack Krupansky commented on CASSANDRA-11383: The int field could easily be made a text field if that would make SASI work better (you can even do prefix query by year then.) Point 1 is precisely what SASI SPARSE is designed for. It also is what Materialized Views (formerly Global Indexes) is for and MV is even better for since it eliminates the need to scan multiple nodes since the rows get collected based on the new partition key that can include the indexed data value. You're using cardinality backwards - it is supposed to be a measure of the number of distinct values in a column, not the number of rows containing each value. See: https://en.wikipedia.org/wiki/Cardinality_%28SQL_statements%29. Granted, in ERD cardinality is the count of rows in a second table for each column value in a given table (one to n, n to one, etc.), but in the context of an index there is only one table involved, although you could consider the index to be a table, but that would be a little odd. In any case, best to stick with the standard SQL meaning of the cardinality of data values in a column. So, to be clear, an email address is high cardinality and gender is low cardinality. And the end of month int field is low cardinality or not dense in the original SASI doc terminology. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202877#comment-15202877 ] Jack Krupansky commented on CASSANDRA-11383: Sorry for any extra noise I may have generated here - [~xedin] has the info he needs without me. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202851#comment-15202851 ] DOAN DuyHai commented on CASSANDRA-11383: - [~jkrupan] Other than terminology and wording/documentation about {{SPARSE}} mode, what interests me more is how SASI can deal with {{DENSE}} index e.g. few indexed value for millions/billions of matching primary keys. The original secondary index was not adapted for 1. very low cardinality (index on email to search for user for example) because it does not scale well with cluster size. In worst case you'll need to scan N/RF nodes to fetch 0 or at most 1 user so the ratio effort vs result is bad 2. very high cardinality (user gender for example) because for each distinct indexed value, you can have many matching users and it creates ultra wide-rows, an anti-pattern With SASI, although point 1. still holds (that's the common issue with all **distributed** index systems, even Solr or ES) I had hoped that limitation 2. will be lifted since SASI stores data in its own structures > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202156#comment-15202156 ] DOAN DuyHai commented on CASSANDRA-11383: - [~xedin] I can't promis anything but I'm going to find a way to share data. We're talking of 100Gb to upload ... For , below is the ma-1831 standard files: {noformat} root@ns3033877:~# ll /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-* | grep -v "SI" -rw-r--r-- 1 cassandra cassandra 4081363 Mar 17 21:40 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-CompressionInfo.db -rw-r--r-- 1 cassandra cassandra 16350922629 Mar 17 21:40 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-Data.db -rw-r--r-- 1 cassandra cassandra 10 Mar 17 21:40 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-Digest.crc32 -rw-r--r-- 1 cassandra cassandra 150496120 Mar 17 21:40 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-Filter.db -rw-r--r-- 1 cassandra cassandra 4678909890 Mar 17 21:40 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-Index.db -rw-r--r-- 1 cassandra cassandra 12601 Mar 17 21:40 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-Statistics.db -rw-r--r-- 1 cassandra cassandra40410476 Mar 17 21:40 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-Summary.db -rw-r--r-- 1 cassandra cassandra 92 Mar 17 21:40 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-TOC.txt {noformat} And the SASI indices: {noformat} -rw-r--r-- 1 cassandra cassandra 97 Mar 18 20:57 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db -rw-r--r-- 1 cassandra cassandra 24821784 Mar 18 20:20 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_0 -rw-r--r-- 1 cassandra cassandra 24825880 Mar 18 20:20 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_1 -rw-r--r-- 1 cassandra cassandra 24821784 Mar 18 20:25 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_10 -rw-r--r-- 1 cassandra cassandra 24817688 Mar 18 20:26 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_11 -rw-r--r-- 1 cassandra cassandra 24821784 Mar 18 20:26 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_12 -rw-r--r-- 1 cassandra cassandra 24821784 Mar 18 20:26 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_13 -rw-r--r-- 1 cassandra cassandra 24821784 Mar 18 20:27 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_14 -rw-r--r-- 1 cassandra cassandra 24817688 Mar 18 20:27 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_15 -rw-r--r-- 1 cassandra cassandra 24825880 Mar 18 20:27 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_16 -rw-r--r-- 1 cassandra cassandra 24825880 Mar 18 20:28 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_17 -rw-r--r-- 1 cassandra cassandra 24825880 Mar 18 20:28 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_18 -rw-r--r-- 1 cassandra cassandra 24817688 Mar 18 20:29 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_19 -rw-r--r-- 1 cassandra cassandra 24821784 Mar 18 20:21 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_2 -rw-r--r-- 1 cassandra cassandra 24821784 Mar 18 20:29 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_20 -rw-r--r-- 1 cassandra cassandra 24825880 Mar 18 20:30 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_21 -rw-r--r-- 1 cassandra cassandra 24821784 Mar 18 20:30 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_dsp_code_idx.db_22 -rw-r--r-- 1 cassandra cassandra 24817688 Mar 18 20:31 /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202816#comment-15202816 ] Jack Krupansky commented on CASSANDRA-11383: The terminology is a bit confusing here - everybody understands what a sparse matrix is, but exactly what constitutes sparseness in a column is very unclear. What is clear is that the cardinality (number of distinct values) is low for that int field. A naive person (okay... me) would have thought that sparse data meant few distinct values, which is what the int field is (36 distinct values.) I decided to check the doc to see what it says about SPARSE, but discovered that the doc doesn't exist yet in the main Cassandra doc - I sent a message to d...@datastax.com about that. So I went back to the orginal, pre-integration doc (https://github.com/xedin/sasi) and see that there is separate, non-integrated doc for SASI in the Cassandra source tree - https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, which makes clear that "SPARSE, which is meant to improve performance of querying large, dense number ranges like timestamps for data inserted every millisecond." Oops... SPARSE=dense, but in any case SPARSE is designed for high cardinality of distinct values, which the int field is clearly not. I would argue that SASI should give a strongly-worded warning if the column data for a SPARSE index has low cardinality - low number of distinct column values and high number of index values per column value. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202088#comment-15202088 ] DOAN DuyHai commented on CASSANDRA-11383: - [~jkrupan] 1. Not that large, see below the Spark script to generate randomized data: {code:scala} import java.util.UUID import com.datastax.spark.connector._ case class Resource(dsrId:UUID, relSeq:Long, seq:Long, dspReleaseCode:String, commercialOfferCode:String, transferCode:String, mediaCode:String, modelCode:String, unicWork:String, title:String, status:String, contributorsName:List[String], periodEndMonthInt:Int, dspCode:String, territoryCode:String, payingNetQty:Long, authorizedSocietiesTxt: String, relType:String) val allDsps = List("youtube", "itunes", "spotify", "deezer", "vevo", "google-play", "7digital", "spotify", "youtube", "spotify", "youtube", "youtube", "youtube") val allCountries = List("FR", "UK", "BE", "IT", "NL", "ES", "FR", "FR") val allPeriodsEndMonths:Seq[Int] = for(year <- 2013 to 2015; month <- 1 to 12) yield (year.toString + f"$month%02d").toInt val allModelCodes = List("PayAsYouGo", "AdFunded", "Subscription") val allMediaCodes = List("Music","Ringtone") val allTransferCodes = List("Streaming","Download") val allCommercialOffers = List("Premium","Free") val status = "Declared" val authorizedSocietiesTxt: String="sacem sgae" val relType = "whatever" val titlesAndContributors: Array[(String, String)] = sc.textFile("/tmp/top_100.csv").map(line => {val split = line.split(";"); (split(1),split(2))}).distinct.collect for(i<- 1 to 100) { sc.parallelize((1 to 4000).map(i => UUID.randomUUID)). map(dsrId => { val r = new java.util.Random(System.currentTimeMillis()) val relSeq = r.nextLong() val seq = r.nextLong() val dspReleaseCode = seq.toString val dspCode = allDsps(r.nextInt(allDsps.size)) val periodEndMonth = allPeriodsEndMonths(r.nextInt(allPeriodsEndMonths.size)) val territoryCode = allCountries(r.nextInt(allCountries.size)) val modelCode = allModelCodes(r.nextInt(allModelCodes.size)) val mediaCode = allMediaCodes(r.nextInt(allMediaCodes.size)) val transferCode = allTransferCodes(r.nextInt(allTransferCodes.size)) val commercialOffer = allCommercialOffers(r.nextInt(allCommercialOffers.size)) val titleAndContributor: (String, String) = titlesAndContributors(r.nextInt(titlesAndContributors.size)) val title = titleAndContributor._1 val contributorsName = titleAndContributor._2.split(",").toList val unicWork = title + "|" + titleAndContributor._2 val payingNetQty = r.nextInt(100).toLong Resource(dsrId, relSeq, seq, dspReleaseCode, commercialOffer, transferCode, mediaCode, modelCode, unicWork, title, status, contributorsName, periodEndMonth, dspCode, territoryCode, payingNetQty, authorizedSocietiesTxt, relType) }). saveToCassandra("keyspace", "resource") Thread.sleep(500) } {code:scala} 2. Does OOM occur if SASI indexes are created one at a time - serially, waiting for full index to build before moving on to the next? --> *Yes it does*, see log file with CMS settings attached above 3. Do you need a 32G heap to build just one index? I cringe when I see a heap larger than 14G. See if you can get a single SASI index build to work in 10-12G or less. --> Well the 32Gb heap was for analytics use-cases and I was using G1 GC. But changing to CMS with 8Gb heap has the same result, OOM. see log file with CMS settings attached above > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202027#comment-15202027 ] Pavel Yaskevich commented on CASSANDRA-11383: - What is your memtable size? all of the index building is currently happening in fixed chunk sizes in memory, in case of flushing from memtable size of the every individual segment is going to be set to the size of memtable, so if you have big memtable and a bunch of indexes at this point it all depends on how fast can you flush and it looks like flush itself takes about 0.5 second. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202162#comment-15202162 ] DOAN DuyHai commented on CASSANDRA-11383: - [~jkrupan] "I mean, can even one index alone complete a build?" --> I'm afraid that even one Index build will lead to OOM eventually because the table is big. I'm going to DROP all indices and recreate just ONE and let it build the whole night to see if tomorrow morning it will OOM or not. I'll set *max_memtable_flush_memory_in_mb* to something small like 128Mb as [~xedin] recommended to see if it can helps C* finishing the index build > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202117#comment-15202117 ] Pavel Yaskevich commented on CASSANDRA-11383: - It would be very helpful if you could share the sstables from a single machine through AWS or something so I can load them locally and try everything. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202712#comment-15202712 ] DOAN DuyHai commented on CASSANDRA-11383: - bq. I've figured out what is going on and first of all period_end_month_int index is not sparse - at least first term in that index has ~11M tokens assigned to it You're right, {{period_end_month_int}} is not *parse* in the sense we mean it in English but SASI index mode {{SPARSE}} is the only one allowed for numeric fields, {{PREFIX}} and {{CONTAINS}} are reserved to text fields. So we have a fundamental issue here, how to index *dense* numeric values ? bq. Temporary fix for this situation is switching to LCS with fixed maximum sstable size, as I mentioned in my previous comment. Can you elaborate further ? What, in LCS, makes it work with current situation compared to STCS ? Is it the total number of SSTables ? (currently with STCS there less than 100 SSTables per node so it's not really a big issue) Is it the fact that a partition is guanrateed to be in a single SSTable with LCS ? (again considering the schema we have mostly tiny rows but a lot of them) For now I'm going to switch to LCS to see if we can finish building the index without OOM. For long term, LCS is not the solution because this table size will increase quickly over time and having tombstones in level > L3 will make them rarely compacted > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202196#comment-15202196 ] Jack Krupansky commented on CASSANDRA-11383: Just to make sure I understand what's going on... 1. The first index is for the territory_code column, whose values are simple 2-character country codes from allCountries which has 8 entries, with 'FR' repeated 3 times in that list of 8 country codes. 2. How many rows are generated per machine - is it 100 * 40,000,000 = 4 billion? 3. That means that the SASI index will have six unique index values, each with roughly 4 billion / 8 = 500 million rows, correct? (Actually, 5 of the 6 unique values will have 500 million rows and the 6th will have 1.5 billion rows (3 times 500 million.) Sounds like a great stress test for SASI! 4. That's just for the territory_code column. 5. Some of the columns have only 2 unique values, like commercial_offer_code. That would mean 2 billion rows for each indexed unique value. An even more excellent stress test for SASI! > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202669#comment-15202669 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] I've figured out what is going on and first of all period_end_month_int index is not sparse - at least first term in that index has ~11M tokens assigned to it, that's where the source of the problem is - because it's sparse + composite combined TokenTreeBuilder has to pull a lot of stuff into memory when stitching segments together, I'm trying to figure out if there is a way to make it less memory intensive. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202206#comment-15202206 ] DOAN DuyHai commented on CASSANDRA-11383: - [~jkrupan] 1. --> Yes correct 2. Originally we targeted 4 billions but we stop after 80 iterations (instead of 100) so it gave us something like 3.4 billions 3. Yes correct. But this territory_code is not meant to be used alone but in combination with the resource_period_end_month_int_idx index and others to cut down the number of rows to be fetched 4. and 5. Same answer as above Indeed, those indices are designed to support an user-search form to perform TopK aggregation with dynamic filtering. SASI is used for the filtering part and Spark for the TopK aggregation Indeed, the *resource_period_end_month* filter is mandatory for the user and usually people put a month or a range of 1 year at most. This cuts down the whole dataset to a reasonable subset over which we apply other filters > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202575#comment-15202575 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] I've successfully downloaded the sstable and was able to run re-index and it looks like everything is good until all of the segments are stitched together into actual index file, trying to figure out what is going with that. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202259#comment-15202259 ] DOAN DuyHai commented on CASSANDRA-11383: - [~xedin] OK, I'm trying to fetch the sstable with its data files In the meantime, I just re-create one index as shown above using * max_compaction_flush_memory_in_mb * = 128 and SASI flushes thousands of index files and eventually the server dies maybe because too many file handles {noformat} INFO [SASI-General:1] 2016-03-18 22:45:37,480 PerSSTableIndexWriter.java:258 - Flushed index segment /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_period_end_month_int_idx.db_130370, took 0 ms. INFO [SASI-General:1] 2016-03-18 22:45:37,581 PerSSTableIndexWriter.java:258 - Flushed index segment /home/cassandra/data/sharon/resource_bench-4d065db0ebbc11e5995bd129cfce5717/ma-1831-big-SI_resource_period_end_month_int_idx.db_130371, took 101 ms. ERROR [SASI-General:1] 2016-03-18 22:45:37,582 CassandraDaemon.java:195 - Exception in thread Thread[SASI-General:1,5,main] org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed {noformat} > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202281#comment-15202281 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] Ok yeah, it looks like there are too many index components. I suspect that there is something wrong with index builder there because we have sstable files which are over 100G in size and have about 20 indexes attached to them without a problem, merging 24M sized segments should never be a problem, so it would be very helpful if you could share that ma-big-1831 sstable somehow so I can run couple of experiments and see where the things are. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202377#comment-15202377 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] Thanks! I've already started downloading everything. Meanwhile what you can also try is to switch to LCS with max sstable size of 1G or lower and try to re-create the data with all of the indexes defined. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202241#comment-15202241 ] Jack Krupansky commented on CASSANDRA-11383: What's the table schema? Is period_end_month_int text or int? period_end_month_int has 3 years times 12 months = 36 unique values, so 3.4 billion / 36 = 94.44 million rows for each indexed unique value. > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202245#comment-15202245 ] Pavel Yaskevich commented on CASSANDRA-11383: - [~doanduyhai] So it actually happens when existing sstable being indexed, it's size is 16G and segment sizes are about 30MB which looks totally fine to me, there might be a bug I introduced while porting SASI to 3.x. I don't actually need whole dataset in this case since it's OOM indexing a single sstable, can you please just share that one sstable instead which is just 16G data? > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202353#comment-15202353 ] DOAN DuyHai commented on CASSANDRA-11383: - [~xedin] Upload is on the way, here is the Google Drive folder for the data + schema + C* config: https://drive.google.com/folderview?id=0B6wR2aj4Cb6wdm03TFZtcXllX2M&usp=sharing The big *Data* file (23Gb) is being uploaded, it will be available in ≈ 45mins. I hope you have fiber optic to download it ... > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202262#comment-15202262 ] DOAN DuyHai commented on CASSANDRA-11383: - [~jkrupan] Below is the schema: {code:sql} create table if not exists sharon.resource_bench ( dsr_id uuid, rel_seq bigint, seq bigint, dsp_code varchar, model_code varchar, media_code varchar, transfer_code varchar, commercial_offer_code varchar, territory_code varchar, period_end_month_int int, authorized_societies_txt text, rel_type text, status text, dsp_release_code text, title text, contributors_name list, unic_work text, paying_net_qty bigint, PRIMARY KEY ((dsr_id, rel_seq), seq) ) WITH CLUSTERING ORDER BY (seq ASC); {code:sql} > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, > system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202070#comment-15202070 ] DOAN DuyHai commented on CASSANDRA-11383: - I try to create indices ONE by ONE, but same result. new JVM settings: - CMS - Xms8G, Xmx8G after some time, the node goes long Old GC and then crashes: {noformat} 632936; Par Survivor Space: 125829120 -> 120349752 WARN [Service Thread] 2016-03-18 21:02:45,237 GCInspector.java:282 - ConcurrentMarkSweep GC in 10417ms. Par Eden Space: 1006632952 -> 1006632928; Par Survivor Space: 125829120 -> 120921320 WARN [Service Thread] 2016-03-18 21:02:56,969 GCInspector.java:282 - ConcurrentMarkSweep GC in 11675ms. Par Survivor Space: 125829112 -> 121748224 WARN [Service Thread] 2016-03-18 21:03:07,359 GCInspector.java:282 - ConcurrentMarkSweep GC in 10327ms. CMS Old Gen: 7331643344 -> 7331643392; Par Eden Space: 1006632960 -> 1006632864; Par Survivor Space: 125828928 -> 122147720 WARN [Service Thread] 2016-03-18 21:03:50,019 GCInspector.java:282 - ConcurrentMarkSweep GC in 42574ms. CMS Old Gen: 7331643392 -> 7331643368; Par Eden Space: 1006632960 -> 1006632824; Par Survivor Space: 125829120 -> 122651640 WARN [Service Thread] 2016-03-18 21:05:01,704 GCInspector.java:282 - ConcurrentMarkSweep GC in 71592ms. Par Eden Space: 1006632960 -> 1006632928; Par Survivor Space: 125829112 -> 123069400 {noformat} > SASI index build leads to massive OOM > - > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai > Attachments: system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)