[ 
https://issues.apache.org/jira/browse/FLINK-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141796#comment-17141796
 ] 

Yun Tang commented on FLINK-18338:
----------------------------------

I have figured out why this happened and this [success 
CI|https://dev.azure.com/myasuka/flink/_build/results?buildId=157&view=results] 
of multi core modules could also prove it.

The root cause: newly added test {{RocksDBStateMisuseOptionTest}} forgets to 
dispose {{RocksDBKeyedStateBackend}}.

Code below with frocksdbjni of 5.17.2-artisans-2.0 could reproduce this:
{code:java}
NativeLibraryLoader.getInstance().loadLibrary("/tmp/rocksdb-lib");

List<ColumnFamilyHandle> cf = new ArrayList<>(1);
try (DBOptions options = new DBOptions().setCreateIfMissing(true);
     ColumnFamilyOptions columnFamilyOptions = new ColumnFamilyOptions();
     RocksDB rocksdb = RocksDB.open(options,
             "/tmp/rocksdb-2",
             Collections.singletonList(new 
ColumnFamilyDescriptor("default".getBytes(), columnFamilyOptions)),
             cf)) {
    rocksdb.put(ByteBuffer.allocate(4).array(), ByteBuffer.allocate(4).array());
}
{code}
RocksDB-java use 
[#finalize|https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management]
 to release C++ object when Java starts GC. However, if we do not destroy 
column family handle before destroying RocksDB, the 
[assert|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/column_family.cc#L1238]
 would fail at [versions 
reset|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/db_impl.cc#L515]
 when DB closing and we cannot ensure the order of GC, that's why sometimes the 
CI would fail.

I'll create a new PR to fix FLINK-17800 and avoid this problem.

> RocksDB tests crash the JVM on CI
> ---------------------------------
>
>                 Key: FLINK-18338
>                 URL: https://issues.apache.org/jira/browse/FLINK-18338
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends, Tests
>    Affects Versions: 1.11.0
>            Reporter: Chesnay Schepler
>            Assignee: Yun Tang
>            Priority: Blocker
>              Labels: test-stability
>             Fix For: 1.11.0
>
>
> Something about {{pure virtual method called}}.
> Seen this twice in separate PRs.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3615&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3632&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to