[
https://issues.apache.org/jira/browse/HDDS-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-15314:
----------------------------------
Labels: pull-request-available (was: )
> Disable defrag DB metrics due to crash during snapshot defrag
> -------------------------------------------------------------
>
> Key: HDDS-15314
> URL: https://issues.apache.org/jira/browse/HDDS-15314
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: Ozone Manager
> Reporter: Siyao Meng
> Assignee: Siyao Meng
> Priority: Blocker
> Labels: pull-request-available
>
> During snapshot defrag scale testing, Ozone Manager crashed in native RocksDB
> JNI code while the Hadoop Metrics2 timer was collecting generic RocksDB DB
> properties. The crash happened twice in the same setup.
> {code:title=1st crash}
> Stack: [0x00007f60b6f10000,0x00007f60b7011000], sp=0x00007f60b700f378, free
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> C [librocksdbjni-linux64.so+0x47b48b]
> rocksdb::InternalStats::HandleEstimatePendingCompactionBytes(unsigned long*,
> rocksdb::DBImpl*, rocksdb::Version*)+0xb
> C [librocksdbjni-linux64.so+0x3c3da4]
> rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice
> const&, std::string*)+0x84
> C [librocksdbjni-linux64.so+0x2adf9d]
> Java_org_rocksdb_RocksDB_getProperty+0x14d
> J 5382
> org.rocksdb.RocksDB.getProperty(JJLjava/lang/String;I)Ljava/lang/String; (0
> bytes) @ 0x00007f60d0f1ed06 [0x00007f60d0f1ec40+0xc6]
> J 11916 C2
> org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(Lorg/apache/hadoop/metrics2/MetricsRecordBuilder;)V
> (280 bytes) @ 0x00007f60d21bf044 [0x00007f60d21bdf60+0x10e4]
> J 12024 C1
> org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(Lorg/apache/hadoop/metrics2/MetricsCollector;Z)V
> (32 bytes) @ 0x00007f60d1525b14 [0x00007f60d1525220+0x8f4]
> J 11166 C2
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsCollectorImpl;Z)Ljava/lang/Iterable;
> (139 bytes) @ 0x00007f60d293785c [0x00007f60d29377e0+0x7c]
> J 13473 C2
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsSourceAdapter;Lorg/apache/hadoop/metrics2/impl/MetricsBufferBuilder;)V
> (72 bytes) @ 0x00007f60d2f51e30 [0x00007f60d2f51da0+0x90]
> J 13273 C1
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics()Lorg/apache/hadoop/metrics2/impl/MetricsBuffer;
> (115 bytes) @ 0x00007f60d2ec1164 [0x00007f60d2ec04e0+0xc84]
> J 15901 C1 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run()V (23
> bytes) @ 0x00007f60d1d8f20c [0x00007f60d1d8ef60+0x2ac]
> j java.util.TimerThread.mainLoop()V+221
> j java.util.TimerThread.run()V+1
> {code}
> {code:title=2nd crash}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007f0bee21ed7d, pid=711089, tid=0x00007f0be71c0700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_232-b09) (build 1.8.0_232-b09)
> # Java VM: OpenJDK 64-Bit Server VM (25.232-b09 mixed mode linux-amd64
> compressed oops)
> # Problematic frame:
> # C [librocksdbjni-linux64.so+0x3c3d7d]
> rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice
> const&, std::string*)+0x5d
> ...
> {code}
> DB metrics should not have been enabled for defrag DBs in the first place.
> And previously it had been disabled for snapshot DBs in HDDS-12193
> (https://github.com/apache/ozone/commit/ad0debf5e1b) by default. Similar
> measure needs to be taken for defrag DBs as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]