Re: Upgrading to 3.11.8 Caused Map Failures

2020-12-12 Thread Shalom Sagges
You are right Yakir.
How did I miss that?? It was a misconfiguration on my end.

Thanks a lot!

On Sat, Dec 12, 2020 at 9:28 PM Yakir Gibraltar  wrote:

> See also:
> https://support.datastax.com/hc/en-us/articles/360027838911
>
>
> On Sat, Dec 12, 2020 at 9:11 PM Yakir Gibraltar  wrote:
>
>> Hi Shalom,
>> See bug: https://issues.apache.org/jira/browse/CASSANDRA-14978
>> Try to disable mmap:
>> disk_access_mode=standard
>> or
>> disk_access_mode=mmap_index_only
>> Yakir Gibraltar.
>>
>
>
> --
> *בברכה,*
> *יקיר גיברלטר*
>


Re: Upgrading to 3.11.8 Caused Map Failures

2020-12-12 Thread Yakir Gibraltar
See also:
https://support.datastax.com/hc/en-us/articles/360027838911


On Sat, Dec 12, 2020 at 9:11 PM Yakir Gibraltar  wrote:

> Hi Shalom,
> See bug: https://issues.apache.org/jira/browse/CASSANDRA-14978
> Try to disable mmap:
> disk_access_mode=standard
> or
> disk_access_mode=mmap_index_only
> Yakir Gibraltar.
>


-- 
*בברכה,*
*יקיר גיברלטר*


Re: Upgrading to 3.11.8 Caused Map Failures

2020-12-12 Thread Yakir Gibraltar
Hi Shalom,
See bug: https://issues.apache.org/jira/browse/CASSANDRA-14978
Try to disable mmap:
disk_access_mode=standard
or
disk_access_mode=mmap_index_only
Yakir Gibraltar.


Re: Upgrading to 3.11.8 Caused Map Failures

2020-12-11 Thread Shalom Sagges
Forgot to mention that there were also LEAK DETECTED errors:

ERROR [Reference-Reaper] 2020-12-11 03:25:42,172 Ref.java:229 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@451030de) to class
org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1272432140:Memory@[7f623780..7f623aa0)
was not released before the reference was garbage collected
ERROR [Reference-Reaper] 2020-12-11 03:25:42,172 Ref.java:229 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@4fe85bae) to class
org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@183159863
:[Memory@[0..f060), Memory@[0..10e6c0)] was not released before the
reference was garbage collected
ERROR [Reference-Reaper] 2020-12-11 03:25:42,173 Ref.java:229 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@4eb88b74) to class
org.apache.cassandra.io.util.MmappedRegions$Tidier@992658185:/data_path/md-1105027-big-Data.db
was not released before the reference was garbage collected
ERROR [Reference-Reaper] 2020-12-11 03:25:42,176 Ref.java:229 - LEAK
DETECTED: a reference
(org.apache.cassandra.utils.concurrent.Ref$State@3692dae9) to class
org.apache.cassandra.io.util.FileHandle$Cleanup@1791308664:/data_path/md-1105027-big-Index.db
was not released before the reference was garbage collected



On Fri, Dec 11, 2020 at 6:50 PM Shalom Sagges 
wrote:

> Hi All,
>
> I upgraded Cassandra from v3.11.4 to v3.11.8.
> The upgrade went smoothly, however, after a few hours, a node crashed on
> OOM and a few hours later, another one crashed.
>
> Seems like they crashed from excessive GC behaviour (CMS). The logs show
> Map failures on CompactionExecutor:
>
> ERROR *[CompactionExecutor:744] *2020-12-11 03:25:42,169
> JVMStabilityInspector.java:94 - OutOfMemory error letting the JVM handle
> the error:
> ERROR [CompactionExecutor:744] 2020-12-11 03:25:37,765
> CassandraDaemon.java:235 - Exception in thread
> Thread[CompactionExecutor:744,1,main]
> org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed
> at
> org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157)
> at
> org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310)
> at
> org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246)
> at
> org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:170)
> at
> org.apache.cassandra.io.util.MmappedRegions.(MmappedRegions.java:73)
> ...
> ...
> Caused by: java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
> at
> org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153)
> ... 23 common frames omitted
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
> ... 24 common frames omitted
>
>
> *[CompactionExecutor:744] did the following before the crash:*
> INFO  [CompactionExecutor:744] 2020-12-11 03:00:29,985
> NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot
> allocate chunk of 1048576
> WARN  [CompactionExecutor:744] 2020-12-11 03:10:57,437
> BigTableWriter.java:211 - Writing large partition  (108.963MiB)
> WARN  [CompactionExecutor:744] 2020-12-11 03:10:57,437
> BigTableWriter.java:211 - Writing large partition  (151.155MiB)
> WARN  [CompactionExecutor:744] 2020-12-11 03:11:16,445
> BigTableWriter.java:211 - Writing large partition  (253.149MiB)
>
>
> *Some more info:*
> The *max_map_count* is set to 1048575, so all is well there.
> Hugepages are enabled by default (I know I should disable them), but I
> don't think it can cause this behaviour.
> This never happened on v3.11.4, only on v3.11.8.
>
>
> I'd really appreciate your help on this one.
> Thanks!
>
>
>
>
>
>


Upgrading to 3.11.8 Caused Map Failures

2020-12-11 Thread Shalom Sagges
Hi All,

I upgraded Cassandra from v3.11.4 to v3.11.8.
The upgrade went smoothly, however, after a few hours, a node crashed on
OOM and a few hours later, another one crashed.

Seems like they crashed from excessive GC behaviour (CMS). The logs show
Map failures on CompactionExecutor:

ERROR *[CompactionExecutor:744] *2020-12-11 03:25:42,169
JVMStabilityInspector.java:94 - OutOfMemory error letting the JVM handle
the error:
ERROR [CompactionExecutor:744] 2020-12-11 03:25:37,765
CassandraDaemon.java:235 - Exception in thread
Thread[CompactionExecutor:744,1,main]
org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed
at
org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157)
at
org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310)
at
org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246)
at
org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:170)
at
org.apache.cassandra.io.util.MmappedRegions.(MmappedRegions.java:73)
...
...
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
at
org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153)
... 23 common frames omitted
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
... 24 common frames omitted


*[CompactionExecutor:744] did the following before the crash:*
INFO  [CompactionExecutor:744] 2020-12-11 03:00:29,985 NoSpamLogger.java:91
- Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
WARN  [CompactionExecutor:744] 2020-12-11 03:10:57,437
BigTableWriter.java:211 - Writing large partition  (108.963MiB)
WARN  [CompactionExecutor:744] 2020-12-11 03:10:57,437
BigTableWriter.java:211 - Writing large partition  (151.155MiB)
WARN  [CompactionExecutor:744] 2020-12-11 03:11:16,445
BigTableWriter.java:211 - Writing large partition  (253.149MiB)


*Some more info:*
The *max_map_count* is set to 1048575, so all is well there.
Hugepages are enabled by default (I know I should disable them), but I
don't think it can cause this behaviour.
This never happened on v3.11.4, only on v3.11.8.


I'd really appreciate your help on this one.
Thanks!