[jira] [Commented] (CASSANDRA-8552) Large compactions run out of off-heap RAM

Benedict (JIRA) Fri, 02 Jan 2015 14:30:03 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263288#comment-14263288
 ]


Benedict commented on CASSANDRA-8552:
-------------------------------------

This is going to be a challenging one to diagnose, since the kernel log 
suggests this most likely isn't a bug with C*, although there may be some 
problematic behaviour with so many source files, or large target files, that 
triggers it.

Could you establish for sure this isn't the bug: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1308796 ? It seems to 
describe the situation closely enough to be a candidate issue that may be 
solved by a different Ubuntu image. If not we can try and think of a method of 
attack for pinning down which C* behaviours are making it a problem.

You also might be able to compact each table individually, offline, to get the 
system started up.



> Large compactions run out of off-heap RAM
> -----------------------------------------
>
>                 Key: CASSANDRA-8552
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8552
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu 14.4 
> AWS EC2
> 12 m1.xlarge nodes [4 cores, 16GB RAM, 1TB storage (251GB Used)]
> Java build 1.7.0_55-b13 and build 1.8.0_25-b17
>            Reporter: Brent Haines
>            Assignee: Marcus Eriksson
>            Priority: Blocker
>             Fix For: 2.1.3
>
>         Attachments: system.log
>
>
> We have a large table of storing, effectively event logs and a pair of 
> denormalized tables for indexing.
> When updating from 2.0 to 2.1 we saw performance improvements, but some 
> random and silent crashes during nightly repairs. We lost a node (totally 
> corrupted) and replaced it. That node has never stabilized -- it simply can't 
> finish the compactions. 
> Smaller compactions finish. Larger compactions, like these two never finish - 
> {code}
> pending tasks: 48
>    compaction type   keyspace             table     completed         total   
>  unit   progress
>         Compaction       data           stories   16532973358   75977993784   
> bytes     21.76%
>         Compaction       data   stories_by_text   10593780658   38555048812   
> bytes     27.48%
> Active compaction remaining time :   0h10m51s
> {code}
> We are not getting exceptions and are not running out of heap space. The 
> Ubuntu OOM killer is reaping the process after all of the memory is consumed. 
> We watch memory in the opscenter console and it will grow. If we turn off the 
> OOM killer for the process, it will run until everything else is killed 
> instead and then the kernel panics.
> We have the following settings configured: 
> 2G Heap
> 512M New
> {code}
> memtable_heap_space_in_mb: 1024
> memtable_offheap_space_in_mb: 1024
> memtable_allocation_type: heap_buffers
> commitlog_total_space_in_mb: 2048
> concurrent_compactors: 1
> compaction_throughput_mb_per_sec: 128
> {code}
> The compaction strategy is leveled (these are read-intensive tables that are 
> rarely updated)
> I have tried every setting, every option and I have the system where the MTBF 
> is about an hour now, but we never finish compacting because there are some 
> large compactions pending. None of the GC tools or settings help because it 
> is not a GC problem. It is an off-heap memory problem.
> We are getting these messages in our syslog 
> {code}
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219527] BUG: Bad page map in 
> process java  pte:00000320 pmd:2d6fa5067
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219545] addr:00007fb820be3000 
> vm_flags:08000070 anon_vma:          (null) mapping:          (null) 
> index:7fb820be3
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219556] CPU: 3 PID: 27344 
> Comm: java Tainted: G    B        3.13.0-24-generic #47-Ubuntu
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219559]  ffff880028510e40 
> ffff88020d43da98 ffffffff81715ac4 00007fb820be3000
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219565]  ffff88020d43dae0 
> ffffffff81174183 0000000000000320 00000007fb820be3
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219568]  ffff8802d6fa5f18 
> 0000000000000320 00007fb820be3000 00007fb820be4000
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219572] Call Trace:
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219584]  [<ffffffff81715ac4>] 
> dump_stack+0x45/0x56
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219591]  [<ffffffff81174183>] 
> print_bad_pte+0x1a3/0x250
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219594]  [<ffffffff81175439>] 
> vm_normal_page+0x69/0x80
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219598]  [<ffffffff8117580b>] 
> unmap_page_range+0x3bb/0x7f0
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219602]  [<ffffffff81175cc1>] 
> unmap_single_vma+0x81/0xf0
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219605]  [<ffffffff81176d39>] 
> unmap_vmas+0x49/0x90
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219610]  [<ffffffff8117feec>] 
> exit_mmap+0x9c/0x170
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219617]  [<ffffffff8110fcf3>] 
> ? __delayacct_add_tsk+0x153/0x170
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219621]  [<ffffffff8106482c>] 
> mmput+0x5c/0x120
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219625]  [<ffffffff81069bbc>] 
> do_exit+0x26c/0xa50
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219631]  [<ffffffff810d7591>] 
> ? __unqueue_futex+0x31/0x60
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219634]  [<ffffffff810d83b6>] 
> ? futex_wait+0x126/0x290
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219640]  [<ffffffff8171d8e0>] 
> ? _raw_spin_unlock_irqrestore+0x20/0x40
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219643]  [<ffffffff8106a41f>] 
> do_group_exit+0x3f/0xa0
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219649]  [<ffffffff8107a050>] 
> get_signal_to_deliver+0x1d0/0x6f0
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219655]  [<ffffffff81013448>] 
> do_signal+0x48/0x960
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219660]  [<ffffffff811112fc>] 
> ? acct_account_cputime+0x1c/0x20
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219664]  [<ffffffff8109d76b>] 
> ? account_user_time+0x8b/0xa0
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219667]  [<ffffffff8109dd84>] 
> ? vtime_account_user+0x54/0x60
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219671]  [<ffffffff81013dc9>] 
> do_notify_resume+0x69/0xb0
> Jan  2 07:06:00 ip-10-0-2-226 kernel: [49801151.219676]  [<ffffffff8172676a>] 
> int_signal+0x12/0x17
> {code}
> This seems like unmap is failing, but I am uncertain about how to fix it or 
> work around it.
> For completeness sake, let me point this out too: The system.log will show 
> whatever was happening when the system stops and the the service is 
> restarted. There is no stake trace. Here is an example: 
> {code}
> INFO  [main] 2015-01-02 06:38:38,813 ColumnFamilyStore.java:840 - Enqueuing 
> flush of local: 1552 (0%) on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:1] 2015-01-02 06:38:38,813 Memtable.java:325 - 
> Writing Memtable-local@172795560(281 serialized bytes, 10 ops, 0%/0% of 
> on/off-heap limit)
> INFO  [MemtableFlushWriter:1] 2015-01-02 06:38:38,824 Memtable.java:364 - 
> Completed flushing 
> /data/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/system-local-ka-778-Data.db
>  (262 bytes) for commitlog position ReplayPosition(segmentId=1420180671225,
>  position=87520)
> INFO  [main] 2015-01-02 06:38:38,825 YamlConfigurationLoader.java:92 - 
> Loading settings from file:/etc/cassandra/cassandra.yaml
> INFO  [main] 2015-01-02 06:38:38,837 YamlConfigurationLoader.java:135 - Node 
> configuration:[authenticator=AllowAllAuthenticator; 
> authorizer=AllowAllAuthorizer; auto_snapshot=true; 
> batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; 
> cas_conten
> tion_timeout_in_ms=1000; client_encryption_options=<REDACTED>; 
> cluster_name=booshaka-batch; column_index_size_in_kb=64; 
> commit_failure_policy=stop; 
> commitlog_directory=/commitlog/cassandra/commitlog; 
> commitlog_segment_size_in_mb=32; commitlog_sync=periodic; comm
> itlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=2048; 
> compaction_throughput_mb_per_sec=128; concurrent_compactors=1; 
> concurrent_counter_writes=32; concurrent_reads=48; concurrent_writes=48; 
> counter_cache_save_period=7200; counter_cache_size_in_mb=null
> ; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; 
> data_file_directories=[/data/cassandra/data]; disk_failure_policy=stop; 
> dynamic_snitch_badness_threshold=0.1; 
> dynamic_snitch_reset_interval_in_ms=600000; 
> dynamic_snitch_update_interval_in_ms=1
> 00; endpoint_snitch=Ec2Snitch; hinted_handoff_enabled=true; 
> hinted_handoff_throttle_in_kb=1024; incremental_backups=false; 
> index_summary_capacity_in_mb=null; 
> index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; 
> internode_compression=all; key_
> cache_save_period=14400; key_cache_size_in_mb=null; 
> listen_address=10.0.2.226; max_hint_window_in_ms=10800000; 
> max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; 
> memtable_cleanup_threshold=0.33; memtable_heap_space_in_mb=1024; 
> memtable_offheap_
> space_in_mb=1024; native_transport_port=9042; num_tokens=256; 
> partitioner=org.apache.cassandra.dht.Murmur3Partitioner; 
> permissions_validity_in_ms=2000; phi_convict_threshold=12; 
> range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; 
> request_schedule
> r=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; 
> row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=10.0.2.226; 
> rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; 
> saved_caches_directory=/data/cassandra/saved_caches; seed
> _provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, 
> parameters=[{seeds=10.0.2.8,10.0.2.144,10.0.2.145}]}]; 
> server_encryption_options=<REDACTED>; snapshot_before_compaction=false; 
> ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=5
> 0; start_native_transport=true; start_rpc=true; storage_port=7000; 
> thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; 
> tombstone_warn_threshold=1000; trickle_fsync=false; 
> trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=6
> 0000; write_request_timeout_in_ms=2000]
> INFO  [main] 2015-01-02 06:38:38,943 MessagingService.java:477 - Starting 
> Messaging Service on port 7000
> INFO  [main] 2015-01-02 06:38:38,981 YamlConfigurationLoader.java:92 - 
> Loading settings from file:/etc/cassandra/cassandra.yaml
> INFO  [main] 2015-01-02 06:38:38,987 YamlConfigurationLoader.java:135 - Node 
> configuration:[authenticator=AllowAllAuthenticator; 
> authorizer=AllowAllAuthorizer; auto_snapshot=true; 
> batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; 
> cas_conten
> tion_timeout_in_ms=1000; client_encryption_options=<REDACTED>; 
> cluster_name=booshaka-batch; column_index_size_in_kb=64; 
> commit_failure_policy=stop; 
> commitlog_directory=/commitlog/cassandra/commitlog; 
> commitlog_segment_size_in_mb=32; commitlog_sync=periodic; comm
> itlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=2048; 
> compaction_throughput_mb_per_sec=128; concurrent_compactors=1; 
> concurrent_counter_writes=32; concurrent_reads=48; concurrent_writes=48; 
> counter_cache_save_period=7200; counter_cache_size_in_mb=null
> ; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; 
> data_file_directories=[/data/cassandra/data]; disk_failure_policy=stop; 
> dynamic_snitch_badness_threshold=0.1; 
> dynamic_snitch_reset_interval_in_ms=600000; 
> dynamic_snitch_update_interval_in_ms=1
> 00; endpoint_snitch=Ec2Snitch; hinted_handoff_enabled=true; 
> hinted_handoff_throttle_in_kb=1024; incremental_backups=false; 
> index_summary_capacity_in_mb=null; 
> index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; 
> internode_compression=all; key_
> cache_save_period=14400; key_cache_size_in_mb=null; 
> listen_address=10.0.2.226; max_hint_window_in_ms=10800000; 
> max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; 
> memtable_cleanup_threshold=0.33; memtable_heap_space_in_mb=1024; 
> memtable_offheap_
> space_in_mb=1024; native_transport_port=9042; num_tokens=256; 
> partitioner=org.apache.cassandra.dht.Murmur3Partitioner; 
> permissions_validity_in_ms=2000; phi_convict_threshold=12; 
> range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; 
> request_schedule
> r=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; 
> row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=10.0.2.226; 
> rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; 
> saved_caches_directory=/data/cassandra/saved_caches; seed
> _provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, 
> parameters=[{seeds=10.0.2.8,10.0.2.144,10.0.2.145}]}]; 
> server_encryption_options=<REDACTED>; snapshot_before_compaction=false; 
> ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=5
> 0; start_native_transport=true; start_rpc=true; storage_port=7000; 
> thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; 
> tombstone_warn_threshold=1000; trickle_fsync=false; 
> trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=6
> 0000; write_request_timeout_in_ms=2000]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8552) Large compactions run out of off-heap RAM

Reply via email to