[
https://issues.apache.org/jira/browse/HDDS-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siyao Meng updated HDDS-15314:
------------------------------
Description:
During snapshot defrag scale testing, Ozone Manager crashed in native RocksDB
JNI code while the Hadoop Metrics2 timer was collecting generic RocksDB DB
properties. The crash happened twice in the same setup.
{code:title=First crash}
Stack: [0x00007f60b6f10000,0x00007f60b7011000], sp=0x00007f60b700f378, free
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [librocksdbjni-linux64.so+0x47b48b]
rocksdb::InternalStats::HandleEstimatePendingCompactionBytes(unsigned long*,
rocksdb::DBImpl*, rocksdb::Version*)+0xb
C [librocksdbjni-linux64.so+0x3c3da4]
rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice
const&, std::string*)+0x84
C [librocksdbjni-linux64.so+0x2adf9d]
Java_org_rocksdb_RocksDB_getProperty+0x14d
J 5382
org.rocksdb.RocksDB.getProperty(JJLjava/lang/String;I)Ljava/lang/String; (0
bytes) @ 0x00007f60d0f1ed06 [0x00007f60d0f1ec40+0xc6]
J 11916 C2
org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(Lorg/apache/hadoop/metrics2/MetricsRecordBuilder;)V
(280 bytes) @ 0x00007f60d21bf044 [0x00007f60d21bdf60+0x10e4]
J 12024 C1
org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(Lorg/apache/hadoop/metrics2/MetricsCollector;Z)V
(32 bytes) @ 0x00007f60d1525b14 [0x00007f60d1525220+0x8f4]
J 11166 C2
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsCollectorImpl;Z)Ljava/lang/Iterable;
(139 bytes) @ 0x00007f60d293785c [0x00007f60d29377e0+0x7c]
J 13473 C2
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsSourceAdapter;Lorg/apache/hadoop/metrics2/impl/MetricsBufferBuilder;)V
(72 bytes) @ 0x00007f60d2f51e30 [0x00007f60d2f51da0+0x90]
J 13273 C1
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics()Lorg/apache/hadoop/metrics2/impl/MetricsBuffer;
(115 bytes) @ 0x00007f60d2ec1164 [0x00007f60d2ec04e0+0xc84]
J 15901 C1 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run()V (23
bytes) @ 0x00007f60d1d8f20c [0x00007f60d1d8ef60+0x2ac]
j java.util.TimerThread.mainLoop()V+221
j java.util.TimerThread.run()V+1
{code}
{code}
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f0bee21ed7d, pid=711089, tid=0x00007f0be71c0700
#
# JRE version: OpenJDK Runtime Environment (8.0_232-b09) (build 1.8.0_232-b09)
# Java VM: OpenJDK 64-Bit Server VM (25.232-b09 mixed mode linux-amd64
compressed oops)
# Problematic frame:
# C [librocksdbjni-linux64.so+0x3c3d7d]
rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice
const&, std::string*)+0x5d
...
{code}
DB metrics should not have been enabled for defrag DBs in the first place. And
previously it had been disabled for snapshot DBs in HDDS-12193
(https://github.com/apache/ozone/commit/ad0debf5e1b) by default. Similar
measure needs to be taken for defrag DBs as well.
was:
During snapshot defrag scale testing, Ozone Manager crashed in native RocksDB
JNI code while the Hadoop Metrics2 timer was collecting generic RocksDB DB
properties. The crash happened twice in the same setup.
{code}
Stack: [0x00007f60b6f10000,0x00007f60b7011000], sp=0x00007f60b700f378, free
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [librocksdbjni-linux64.so+0x47b48b]
rocksdb::InternalStats::HandleEstimatePendingCompactionBytes(unsigned long*,
rocksdb::DBImpl*, rocksdb::Version*)+0xb
C [librocksdbjni-linux64.so+0x3c3da4]
rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice
const&, std::string*)+0x84
C [librocksdbjni-linux64.so+0x2adf9d]
Java_org_rocksdb_RocksDB_getProperty+0x14d
J 5382
org.rocksdb.RocksDB.getProperty(JJLjava/lang/String;I)Ljava/lang/String; (0
bytes) @ 0x00007f60d0f1ed06 [0x00007f60d0f1ec40+0xc6]
J 11916 C2
org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(Lorg/apache/hadoop/metrics2/MetricsRecordBuilder;)V
(280 bytes) @ 0x00007f60d21bf044 [0x00007f60d21bdf60+0x10e4]
J 12024 C1
org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(Lorg/apache/hadoop/metrics2/MetricsCollector;Z)V
(32 bytes) @ 0x00007f60d1525b14 [0x00007f60d1525220+0x8f4]
J 11166 C2
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsCollectorImpl;Z)Ljava/lang/Iterable;
(139 bytes) @ 0x00007f60d293785c [0x00007f60d29377e0+0x7c]
J 13473 C2
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsSourceAdapter;Lorg/apache/hadoop/metrics2/impl/MetricsBufferBuilder;)V
(72 bytes) @ 0x00007f60d2f51e30 [0x00007f60d2f51da0+0x90]
J 13273 C1
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics()Lorg/apache/hadoop/metrics2/impl/MetricsBuffer;
(115 bytes) @ 0x00007f60d2ec1164 [0x00007f60d2ec04e0+0xc84]
J 15901 C1 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run()V (23
bytes) @ 0x00007f60d1d8f20c [0x00007f60d1d8ef60+0x2ac]
j java.util.TimerThread.mainLoop()V+221
j java.util.TimerThread.run()V+1
{code}
{code}
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f0bee21ed7d, pid=711089, tid=0x00007f0be71c0700
#
# JRE version: OpenJDK Runtime Environment (8.0_232-b09) (build 1.8.0_232-b09)
# Java VM: OpenJDK 64-Bit Server VM (25.232-b09 mixed mode linux-amd64
compressed oops)
# Problematic frame:
# C [librocksdbjni-linux64.so+0x3c3d7d]
rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice
const&, std::string*)+0x5d
...
{code}
DB metrics should not have been enabled for defrag DBs in the first place. And
previously it had been disabled for snapshot DBs in HDDS-12193
(https://github.com/apache/ozone/commit/ad0debf5e1b) by default. Similar
measure needs to be taken for defrag DBs as well.
> Disable defrag DB metrics due to crash during snapshot defrag
> -------------------------------------------------------------
>
> Key: HDDS-15314
> URL: https://issues.apache.org/jira/browse/HDDS-15314
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: Ozone Manager
> Reporter: Siyao Meng
> Assignee: Siyao Meng
> Priority: Blocker
>
> During snapshot defrag scale testing, Ozone Manager crashed in native RocksDB
> JNI code while the Hadoop Metrics2 timer was collecting generic RocksDB DB
> properties. The crash happened twice in the same setup.
> {code:title=First crash}
> Stack: [0x00007f60b6f10000,0x00007f60b7011000], sp=0x00007f60b700f378, free
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> C [librocksdbjni-linux64.so+0x47b48b]
> rocksdb::InternalStats::HandleEstimatePendingCompactionBytes(unsigned long*,
> rocksdb::DBImpl*, rocksdb::Version*)+0xb
> C [librocksdbjni-linux64.so+0x3c3da4]
> rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice
> const&, std::string*)+0x84
> C [librocksdbjni-linux64.so+0x2adf9d]
> Java_org_rocksdb_RocksDB_getProperty+0x14d
> J 5382
> org.rocksdb.RocksDB.getProperty(JJLjava/lang/String;I)Ljava/lang/String; (0
> bytes) @ 0x00007f60d0f1ed06 [0x00007f60d0f1ec40+0xc6]
> J 11916 C2
> org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(Lorg/apache/hadoop/metrics2/MetricsRecordBuilder;)V
> (280 bytes) @ 0x00007f60d21bf044 [0x00007f60d21bdf60+0x10e4]
> J 12024 C1
> org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(Lorg/apache/hadoop/metrics2/MetricsCollector;Z)V
> (32 bytes) @ 0x00007f60d1525b14 [0x00007f60d1525220+0x8f4]
> J 11166 C2
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsCollectorImpl;Z)Ljava/lang/Iterable;
> (139 bytes) @ 0x00007f60d293785c [0x00007f60d29377e0+0x7c]
> J 13473 C2
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsSourceAdapter;Lorg/apache/hadoop/metrics2/impl/MetricsBufferBuilder;)V
> (72 bytes) @ 0x00007f60d2f51e30 [0x00007f60d2f51da0+0x90]
> J 13273 C1
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics()Lorg/apache/hadoop/metrics2/impl/MetricsBuffer;
> (115 bytes) @ 0x00007f60d2ec1164 [0x00007f60d2ec04e0+0xc84]
> J 15901 C1 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run()V (23
> bytes) @ 0x00007f60d1d8f20c [0x00007f60d1d8ef60+0x2ac]
> j java.util.TimerThread.mainLoop()V+221
> j java.util.TimerThread.run()V+1
> {code}
> {code}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007f0bee21ed7d, pid=711089, tid=0x00007f0be71c0700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_232-b09) (build 1.8.0_232-b09)
> # Java VM: OpenJDK 64-Bit Server VM (25.232-b09 mixed mode linux-amd64
> compressed oops)
> # Problematic frame:
> # C [librocksdbjni-linux64.so+0x3c3d7d]
> rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice
> const&, std::string*)+0x5d
> ...
> {code}
> DB metrics should not have been enabled for defrag DBs in the first place.
> And previously it had been disabled for snapshot DBs in HDDS-12193
> (https://github.com/apache/ozone/commit/ad0debf5e1b) by default. Similar
> measure needs to be taken for defrag DBs as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]