[jira] [Commented] (HDDS-3852) Failed to import replicated container

2020-06-29 Thread Marton Elek (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147984#comment-17147984
 ] 

Marton Elek commented on HDDS-3852:
---

We discussed it during the Community Meeting. It seems to be hard to reproduce 
the problem, therefore we moved out from 0.7.0.  Feel free to move it back if 
you think it's important to fix (especially as you -- as the release manager -- 
have the final decision). 

Personally I think we need more test with long-running Ozone clusters. The 
upgrade tests introduced by Attila might also help. 

If you have any more logs or any information, please share, and we can 
investigate. 

> Failed to import replicated container
> -
>
> Key: HDDS-3852
> URL: https://issues.apache.org/jira/browse/HDDS-3852
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Priority: Major
>
> Find several container replication failure LOG after upgrade Ozone cluster to 
> June 12th master branch.  The tar file is deleted after import failure. 
>  
> {code}
>  2020-06-23 14:11:19,662 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Starting replication of container 206 from 
> [33b49c34-caa2-4b4f-894e-dce7db4f97b9{ip: 9.180.20.222, host: 
> host-9-180-20-222, networkLocation: /rack1, certSerialId: null}, 
> f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e{ip: 9.179.142.251, host: host251, 
> networkLocation: /rack3, certSerialId: null}, 
> db854037-4846-4093-89de-e492e0f14239{ip: 9.179.142.198, host: host198, 
> networkLocation: /rack3, certSerialId: null}]
> 2020-06-23 14:11:20,504 [grpc-default-executor-111] INFO 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationClient: 
> Container 206 is downloaded to /tmp/container-copy/container-206.tar.gz
> 2020-06-23 14:11:20,505 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Container 206 is downloaded, starting to import.
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] ERROR 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Can't import the downloaded container data id=206
> java.io.IOException: Container descriptor is missing from the container 
> archive.
> at 
> org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.unpackContainerDescriptor(TarContainerPacker.java:190)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:74)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:121)
> at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:129)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Container 206 is replicated successfully
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: 
> Container 206 is replicated.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3852) Failed to import replicated container

2020-07-09 Thread Sammi Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154442#comment-17154442
 ] 

Sammi Chen commented on HDDS-3852:
--

Don't have more information since the downloaded container zip file is deleted 
on failure.  First thing I think we'd better keep the zip file for debug 
purpose, then we can quickly find the root cause. 

> Failed to import replicated container
> -
>
> Key: HDDS-3852
> URL: https://issues.apache.org/jira/browse/HDDS-3852
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Priority: Major
>
> Find several container replication failure LOG after upgrade Ozone cluster to 
> June 12th master branch.  The tar file is deleted after import failure. 
>  
> {code}
>  2020-06-23 14:11:19,662 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Starting replication of container 206 from 
> [33b49c34-caa2-4b4f-894e-dce7db4f97b9{ip: 9.180.20.222, host: 
> host-9-180-20-222, networkLocation: /rack1, certSerialId: null}, 
> f8d9ccf6-20c6-4dfa-8a49-012f43a1b27e{ip: 9.179.142.251, host: host251, 
> networkLocation: /rack3, certSerialId: null}, 
> db854037-4846-4093-89de-e492e0f14239{ip: 9.179.142.198, host: host198, 
> networkLocation: /rack3, certSerialId: null}]
> 2020-06-23 14:11:20,504 [grpc-default-executor-111] INFO 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationClient: 
> Container 206 is downloaded to /tmp/container-copy/container-206.tar.gz
> 2020-06-23 14:11:20,505 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Container 206 is downloaded, starting to import.
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] ERROR 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Can't import the downloaded container data id=206
> java.io.IOException: Container descriptor is missing from the container 
> archive.
> at 
> org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.unpackContainerDescriptor(TarContainerPacker.java:190)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:74)
> at 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:121)
> at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:129)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: 
> Container 206 is replicated successfully
> 2020-06-23 14:11:20,616 [ContainerReplicationThread-0] INFO 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: 
> Container 206 is replicated.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org