[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

Bharat Viswanadham (Jira) Tue, 29 Oct 2019 11:45:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962313#comment-16962313
 ]


Bharat Viswanadham commented on HDDS-2356:
------------------------------------------

Hi [~timmylicheng]
{quote}The long printing logs happen only in your branch tho. The full log is 
too huge to put here. You can think of the same logs as attached over and over 
again for multi megabyte size. 
{quote}
As said the log printing change is done in HDDS-2286 and as it is a list, we 
add the same pipeline to it, it is printing that. Recently a change went in, 
when the client got an error with pipeline, we exclude and used to fail if 
there are no more pipelines, but now if there are no pipelines in the system 
SCM ignores and reissues the same pipeline. Might be in your case, the same 
pipeline is continuously excluded to some error, as this is the list we try to 
add it. So this might be the reason for the huge big log statement. To avoid 
this one thing which we can do is add to exclude list of pipelineId's if it 
does not exist. Will open a Jira and fix this.

 
{quote}Also I'm using S3 gateway to connect to ozone and mount local file path 
by fuse (goofys). Have you tested s3 gateway? Most unit tests are going thru 
RPC.
{quote}
S3 Gateway also uses internally RpcClient to communicate to the cluster. If 
possible, can we run the goofys in debug and get logs when completeMPU is 
happening (what is the parameter list it passing and why this 
MISMATCH_PART_LIST is happening) we can root cause this. I will also try to set 
up a cluster and see it.

 

> Multipart upload report errors while writing to ozone Ratis pipeline
> --------------------------------------------------------------------
>
>                 Key: HDDS-2356
>                 URL: https://issues.apache.org/jira/browse/HDDS-2356
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Manager
>    Affects Versions: 0.4.1
>         Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>            Reporter: Li Cheng
>            Assignee: Bharat Viswanadham
>            Priority: Blocker
>             Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

Reply via email to