[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964554#comment-16964554
 ] 

Li Cheng edited comment on HDDS-2356 at 11/1/19 3:45 AM:
---------------------------------------------------------

Also see a core dump in rocksdb during last night's testing. Please check the 
attachment for the entire log.

 

>From the first glance, it looks like when rocksdb is iterating the write_batch 
>to insert to the memtable, there happens a stl memory error during memory 
>movement. It might not be related to ozone, but it would cause rocksdb 
>failure. 

 

Created https://issues.apache.org/jira/browse/HDDS-2396 to track the core dump 
in OM rocksdb.

Below is some part of the stack:

C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0
 C [librocksdbjni3192271038586903156.so+0x358fec] 
rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, 
rocksdb::Slice const&, rocksdb:
 :ValueType)+0x51c
 C [librocksdbjni3192271038586903156.so+0x359d17] 
rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, 
rocksdb::Slice const&)+0x17
 C [librocksdbjni3192271038586903156.so+0x3513bc] 
rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c
 C [librocksdbjni3192271038586903156.so+0x354df9] 
rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, 
unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, 
unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9
 C [librocksdbjni3192271038586903156.so+0x29fd79] 
rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, 
rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, 
unsigned long, rocksdb::PreReleaseCallback*)+0x24b9
 C [librocksdbjni3192271038586903156.so+0x2a0431] 
rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21
 C [librocksdbjni3192271038586903156.so+0x1a064c] 
Java_org_rocksdb_RocksDB_write0+0xcc
 J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x00007f58f1872dbe 
[0x00007f58f1872d00+0xbe]
 J 10093% C1 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V 
(400 bytes) @ 0x00007f58f2308b0c [0x00007f58f2307a40+0x10cc]
 j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4


was (Author: timmylicheng):
Also see a core dump in rocksdb during last night's testing. Please check the 
attachment for the entire log.

 

>From the first glance, it looks like when rocksdb is iterating the write_batch 
>to insert to the memtable, there happens a stl memory error during memory 
>movement. It might not be related to ozone, but it would cause rocksdb 
>failure. 

Below is some part of the stack:

C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0
C [librocksdbjni3192271038586903156.so+0x358fec] 
rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, 
rocksdb::Slice const&, rocksdb:
:ValueType)+0x51c
C [librocksdbjni3192271038586903156.so+0x359d17] 
rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, 
rocksdb::Slice const&)+0x17
C [librocksdbjni3192271038586903156.so+0x3513bc] 
rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c
C [librocksdbjni3192271038586903156.so+0x354df9] 
rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, 
unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, 
unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9
C [librocksdbjni3192271038586903156.so+0x29fd79] 
rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, 
rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, 
unsigned long, rocksdb::PreReleaseCallback*)+0x24b9
C [librocksdbjni3192271038586903156.so+0x2a0431] 
rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21
C [librocksdbjni3192271038586903156.so+0x1a064c] 
Java_org_rocksdb_RocksDB_write0+0xcc
J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x00007f58f1872dbe 
[0x00007f58f1872d00+0xbe]
J 10093% C1 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V 
(400 bytes) @ 0x00007f58f2308b0c [0x00007f58f2307a40+0x10cc]
j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4

> Multipart upload report errors while writing to ozone Ratis pipeline
> --------------------------------------------------------------------
>
>                 Key: HDDS-2356
>                 URL: https://issues.apache.org/jira/browse/HDDS-2356
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Manager
>    Affects Versions: 0.4.1
>         Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>            Reporter: Li Cheng
>            Assignee: Bharat Viswanadham
>            Priority: Blocker
>             Fix For: 0.5.0
>
>         Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete 
> Multipart Upload Request for bucket: ozone-test, key: 
> 20191012/plc_1570863541668_927
>  8
>  MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
> Complete Multipart Upload Failed: volume: 
> s3c89e813c80ffcea9543004d57b2a1239bucket:
>  ozone-testkey: 20191012/plc_1570863541668_9278
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
>  .java:1104)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
>  at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source)
>  at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883)
>  at 
> org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445)
>  at 
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
>  
> The following errors has been resolved in 
> https://issues.apache.org/jira/browse/HDDS-2322. 
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
>  java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
>  2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to