[ https://issues.apache.org/jira/browse/AMBARI-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alejandro Fernandez updated AMBARI-9990: ---------------------------------------- Attachment: (was: AMBARI-9990.patch) > CopyFromLocal failed to copy Tez tarball to HDFS failed because multiple > processes tried to copy to the same destination simultaneously > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: AMBARI-9990 > URL: https://issues.apache.org/jira/browse/AMBARI-9990 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.0.0 > Reporter: Alejandro Fernandez > Assignee: Alejandro Fernandez > Fix For: 2.0.0 > > Attachments: AMBARI-9990.patch, > hadoop-hdfs-datanode-c6408.ambari.apache.org.log, > hadoop-hdfs-datanode-c6410.ambari.apache.org.log, > hadoop-hdfs-namenode-c6408.ambari.apache.org.log, hdfs-audit.log > > > Pig Service Check and Hive Server 2 START ran on 2 different machines during > the stack installation and failed to copy the tez tarball to HDFS. > I was able to reproduce this locally by calling CopyFromLocal from two > clients simultaneously. See the HDFS audit log, datanode logs on c6408 & > c6410, and namenode log on c6410. > The copyFromLocal command's behavior is: > * Try to create a temporary file <filename>._COPYING_ and write the real data > there > * If hit any exception, delete the file with the name <filename>._COPYING_ > Thus we have the following race condition in this test: > Process P1 created file "tez.tar.gz._COPYING_" and wrote data to it > Process P2 fired the same copyFromLocal command and hit exception because it > could not get the lease > P2 then deleted the file "tez.tar.gz._COPYING_" > P1 could not close the file "tez.tar.gz._COPYING_" since it had been deleted > by P2. The exception would say "could not find lease for file..." > In general we do not have the correct synchronization guarantee for the > "copyFromLocal" command. > One solution is for the destination file name to be unique. Because the mv > command is synchronized by the namenode, at least one of them will succeed in > naming the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)