SteveStevenpoor opened a new issue, #11791:
URL: https://github.com/apache/gluten/issues/11791

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   When I run, for example, nexmark q4 with state backend type rocksdb, the job 
fails. It happens because for some reason task manager drops. Logs of 
taskmanager just interrupted, which indicates unexpected crash. However, there 
is some info in stdout:
   
   ```bash
   WARNING: Unknown module: org.apache.arrow.memory.core specified to 
--add-opens
   Created work directory /tmp/velox4j-307006935592892106.
   Found required libraries in container velox4j-lib/Linux/amd64.
   Copying library velox4j-lib/Linux/amd64/libevent-2.0.so.5 to 
/tmp/velox4j-307006935592892106/lib/libevent-2.0.so.5...
   Copying library velox4j-lib/Linux/amd64/libvelox.so to 
/tmp/velox4j-307006935592892106/lib/libvelox.so...
   Copying library velox4j-lib/Linux/amd64/libcrypto.so.1.1 to 
/tmp/velox4j-307006935592892106/lib/libcrypto.so.1.1...
   Copying library velox4j-lib/Linux/amd64/libfmt.so.9 to 
/tmp/velox4j-307006935592892106/lib/libfmt.so.9...
   Copying library velox4j-lib/Linux/amd64/libaio.so.1 to 
/tmp/velox4j-307006935592892106/lib/libaio.so.1...
   Copying library velox4j-lib/Linux/amd64/libzstd.so.1 to 
/tmp/velox4j-307006935592892106/lib/libzstd.so.1...
   Copying library velox4j-lib/Linux/amd64/libgflags.so.2.2 to 
/tmp/velox4j-307006935592892106/lib/libgflags.so.2.2...
   Copying library velox4j-lib/Linux/amd64/libfolly.so.0.58.0-dev to 
/tmp/velox4j-307006935592892106/lib/libfolly.so.0.58.0-dev...
   Copying library velox4j-lib/Linux/amd64/libssl.so.1.1 to 
/tmp/velox4j-307006935592892106/lib/libssl.so.1.1...
   Copying library velox4j-lib/Linux/amd64/libicuuc.so.66 to 
/tmp/velox4j-307006935592892106/lib/libicuuc.so.66...
   Copying library velox4j-lib/Linux/amd64/libsodium.so.23 to 
/tmp/velox4j-307006935592892106/lib/libsodium.so.23...
   Copying library velox4j-lib/Linux/amd64/libsasl2.so.2 to 
/tmp/velox4j-307006935592892106/lib/libsasl2.so.2...
   Copying library velox4j-lib/Linux/amd64/liblzma.so.5 to 
/tmp/velox4j-307006935592892106/lib/liblzma.so.5...
   Copying library velox4j-lib/Linux/amd64/libboost_program_options.so.1.81.0 
to /tmp/velox4j-307006935592892106/lib/libboost_program_options.so.1.81.0...
   Copying library velox4j-lib/Linux/amd64/libboost_filesystem.so.1.81.0 to 
/tmp/velox4j-307006935592892106/lib/libboost_filesystem.so.1.81.0...
   Copying library velox4j-lib/Linux/amd64/libdouble-conversion.so.3 to 
/tmp/velox4j-307006935592892106/lib/libdouble-conversion.so.3...
   Copying library velox4j-lib/Linux/amd64/libboost_context.so.1.81.0 to 
/tmp/velox4j-307006935592892106/lib/libboost_context.so.1.81.0...
   Copying library velox4j-lib/Linux/amd64/librdkafka.so.1 to 
/tmp/velox4j-307006935592892106/lib/librdkafka.so.1...
   Copying library velox4j-lib/Linux/amd64/libstdc++.so.6 to 
/tmp/velox4j-307006935592892106/lib/libstdc++.so.6...
   Copying library velox4j-lib/Linux/amd64/libunwind.so.8 to 
/tmp/velox4j-307006935592892106/lib/libunwind.so.8...
   Copying library velox4j-lib/Linux/amd64/libz.so.1 to 
/tmp/velox4j-307006935592892106/lib/libz.so.1...
   Copying library velox4j-lib/Linux/amd64/liblz4.so.1 to 
/tmp/velox4j-307006935592892106/lib/liblz4.so.1...
   Copying library velox4j-lib/Linux/amd64/libicudata.so.66 to 
/tmp/velox4j-307006935592892106/lib/libicudata.so.66...
   Copying library velox4j-lib/Linux/amd64/libglog.so.0 to 
/tmp/velox4j-307006935592892106/lib/libglog.so.0...
   Copying library velox4j-lib/Linux/amd64/libvelox4j.so to 
/tmp/velox4j-307006935592892106/lib/libvelox4j.so...
   Copying library velox4j-lib/Linux/amd64/libbz2.so.1.0 to 
/tmp/velox4j-307006935592892106/lib/libbz2.so.1.0...
   Copying library velox4j-lib/Linux/amd64/libsnappy.so.1 to 
/tmp/velox4j-307006935592892106/lib/libsnappy.so.1...
   Copying library velox4j-lib/Linux/amd64/libcppkafka.so.0.4.1 to 
/tmp/velox4j-307006935592892106/lib/libcppkafka.so.0.4.1...
   WARNING: Logging before InitGoogleLogging() is written to STDERR
   I0319 14:33:22.678424 2944230 JniLoader.cc:29] Initializing Velox4J...
   I0319 14:33:22.679422 2944230 JniLoader.cc:43] Velox4J initialized.
   All required libraries were successfully loaded.
   I0319 14:33:22.850020 2944230 HiveConnector.cpp:56] Hive connector 
connector-hive created with maximum of 20000 cached file handles with 
expiration of 0ms.
   E0319 14:33:41.576027 2945031 MemoryManager.cc:264] [Velox4J MemoryManager 
DTOR] Memory leak found on Velox memory pool: Memory Pool[Decoding Memory Pool 
LEAF root[root] parent[root] MALLOC track-usage thread-safe]<unlimited max 
capacity capacity 128.00MB used 512B available 1023.50KB reservation [used 
512B, reserved 1.00MB, min 0B] counters [allocs 4, frees 0, reserves 0, 
releases 0, collisions 0])>. Please make sure your code released all opened 
resources already.
   E0319 14:33:41.576107 2945031 MemoryManager.cc:264] [Velox4J MemoryManager 
DTOR] Memory leak found on Velox memory pool: Memory Pool[root AGGREGATE 
root[root] parent[null] MALLOC track-usage thread-safe]<unlimited max capacity 
capacity 128.00MB used 512B available 0B reservation [used 0B, reserved 1.00MB, 
min 0B] counters [allocs 0, frees 0, reserves 0, releases 0, collisions 0])>. 
Please make sure your code released all opened resources already.
   E0319 14:33:41.576125 2945031 MemoryPool.cpp:461] [MEM] Memory leak (Used 
memory): Memory Pool[Decoding Memory Pool LEAF root[root] parent[root] MALLOC 
track-usage thread-safe]<unlimited max capacity capacity 128.00MB used 512B 
available 1023.50KB reservation [used 512B, reserved 1.00MB, min 0B] counters 
[allocs 4, frees 0, reserves 0, releases 0, collisions 0])>
   E0319 14:33:41.576150 2945031 Exceptions.h:66] Line: 
~/velox4j/src/main/cpp/main/velox4j/memory/MemoryManager.cc:92, 
Function:removePool, Expression: pool->reservedBytes() == 0 (1048576 vs. 0), 
Source: RUNTIME, ErrorCode: INVALID_STATE
   W0319 14:33:41.576576 2945031 ExceptionTracer.cpp:187] Invalid trace stack 
for exception of type: facebook::velox::VeloxRuntimeError
   terminate called after throwing an instance of 
'facebook::velox::VeloxRuntimeError'
     what():  Exception: VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: (1048576 vs. 0)
   Retriable: False
   Expression: pool->reservedBytes() == 0
   Function: removePool
   File: ~/velox4j/src/main/cpp/main/velox4j/memory/MemoryManager.cc
   Line: 92
   Stack trace:
   # 0  _ZN8facebook5velox7process10StackTraceC1Ei
   # 1  
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
   # 2  
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
   # 3  
_ZN7velox4j12_GLOBAL__N_120ListenableArbitrator10removePoolEPN8facebook5velox6memory10MemoryPoolE
   # 4  _ZN8facebook5velox6memory13MemoryManager8dropPoolEPNS1_10MemoryPoolE
   # 5  _ZN8facebook5velox6memory14MemoryPoolImplD2Ev
   # 6  _ZN7velox4j13MemoryManager11tryDestructEv
   # 7  _ZN7velox4j13MemoryManagerD2Ev
   # 8  
_ZNSt10_HashtableIjSt4pairIKjSt10shared_ptrIvEESaIS4_ENSt8__detail10_Select1stESt8equal_toIjESt4hashIjENS6_18_Mod_range_hashingENS6_20_Default_ranged_hashENS6_20_Prime_rehash_policyENS6_17_Hashtable_traitsILb0ELb0ELb1EEEE8_M_eraseEmPNS6_15_Hash_node_baseEPNS6_10_Hash_nodeIS4_Lb0EEE
   # 9  _ZN7velox4j11ResourceMapISt10shared_ptrIvEE5eraseEj
   # 10 _ZN7velox4j11ObjectStore15releaseInternalEj
   # 11 _ZN7velox4j12_GLOBAL__N_116releaseCppObjectEP7JNIEnv_P8_jobjectl
   # 12 0x00007f92d87cea0f
   # 13 0x00007f92d87c9272
   # 14 0x00007f92d87c9272
   # 15 0x00007f92d87c9272
   # 16 0x00007f92d87c92b7
   # 17 0x00007f92d87c9272
   # 18 0x00007f92d87c92b7
   # 19 0x00007f92d87c9272
   # 20 0x00007f92d87c92b7
   # 21 0x00007f92d87c9272
   # 22 0x00007f92d87c92b7
   # 23 0x00007f92d87c9272
   # 24 0x00007f92d87c9272
   # 25 0x00007f92d87c9272
   # 26 0x00007f92d87c92b7
   # 27 0x00007f92d87c9272
   # 28 0x00007f92d87c92b7
   # 29 0x00007f92d87c9272
   # 30 0x00007f92d87c9272
   # 31 0x00007f92d87c9272
   # 32 0x00007f92d87c92b7
   # 33 0x00007f92d87bfcc8
   # 34 
_ZN9JavaCalls11call_helperEP9JavaValueRK12methodHandleP17JavaCallArgumentsP6Thread
   # 35 _ZN9JavaCalls12call_virtualEP9JavaValue6HandleP5KlassP6SymbolS6_P6Thread
   # 36 _ZL12thread_entryP10JavaThreadP6Thread
   # 37 _ZN10JavaThread17thread_main_innerEv
   # 38 _ZN6Thread8call_runEv
   # 39 _ZL19thread_native_entryP6Thread
   # 40 start_thread
   # 41 clone
   ```
   
   ### Gluten version
   
   _No response_
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to