----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31878/#review75924 -----------------------------------------------------------
Ship it! I agree that we can let NameNode do the locking here. I don't care if both agents do the same work and the last one in wins. ambari-common/src/main/python/resource_management/libraries/functions/dynamic_variable_interpretation.py <https://reviews.apache.org/r/31878/#comment123252> That's a lot of code to do something as simple as ``` unique_string = str(uuid.uuid4())[:8] ``` I know we don't need UUID power here, but it's concise and makes the code cleaner. - Jonathan Hurley On March 9, 2015, 9:41 p.m., Alejandro Fernandez wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/31878/ > ----------------------------------------------------------- > > (Updated March 9, 2015, 9:41 p.m.) > > > Review request for Ambari, Andrew Onischuk, Jonathan Hurley, Nate Cole, and > Sid Wagle. > > > Bugs: AMBARI-9990 > https://issues.apache.org/jira/browse/AMBARI-9990 > > > Repository: ambari > > > Description > ------- > > Pig Service Check and Hive Server 2 START ran on 2 different machines during > the stack installation and failed to copy the tez tarball to HDFS. > > I was able to reproduce this locally by calling CopyFromLocal from two > clients simultaneously. See the HDFS audit log, datanode logs on c6408 & > c6410, and namenode log on c6410. > > The copyFromLocal command's behavior is: > * Try to create a temporary file <filename>._COPYING_ and write the real data > there > * If hit any exception, delete the file with the name <filename>._COPYING_ > > Thus we have the following race condition in this test: > Process P1 created file "tez.tar.gz._COPYING_" and wrote data to it > Process P2 fired the same copyFromLocal command and hit exception because it > could not get the lease > P2 then deleted the file "tez.tar.gz._COPYING_" > P1 could not close the file "tez.tar.gz._COPYING_" since it had been deleted > by P2. The exception would say "could not find lease for file..." > In general we do not have the correct synchronization guarantee for the > "copyFromLocal" command. > > One solution is for the destination file name to be unique. Because the mv > command is synchronized by the namenode, at least one of them will succeed in > naming the file. > > > Diffs > ----- > > > ambari-common/src/main/python/resource_management/libraries/functions/dynamic_variable_interpretation.py > 00b8d70 > > Diff: https://reviews.apache.org/r/31878/diff/ > > > Testing > ------- > > Unit tests on builds.apache.org passed, > https://builds.apache.org/job/Ambari-trunk-test-patch/1977/ > > I also deployed a cluster and verified that it was able to copy the tarballs > to HDFS when installing YARN, Hive, Pig. > > [root@c6408 ~]# su - hdfs -c 'hadoop fs -ls -R /hdp/apps/2.2.2.0-2538/' > dr-xr-xr-x - hdfs hdfs 0 2015-03-10 00:55 > /hdp/apps/2.2.2.0-2538/hive > -r--r--r-- 3 hdfs hadoop 82982575 2015-03-10 00:55 > /hdp/apps/2.2.2.0-2538/hive/hive.tar.gz > dr-xr-xr-x - hdfs hdfs 0 2015-03-10 00:57 > /hdp/apps/2.2.2.0-2538/mapreduce > -r--r--r-- 3 hdfs hadoop 105000 2015-03-10 00:57 > /hdp/apps/2.2.2.0-2538/mapreduce/hadoop-streaming.jar > -r--r--r-- 3 hdfs hadoop 192699956 2015-03-09 18:15 > /hdp/apps/2.2.2.0-2538/mapreduce/mapreduce.tar.gz > dr-xr-xr-x - hdfs hdfs 0 2015-03-10 00:56 > /hdp/apps/2.2.2.0-2538/pig > -r--r--r-- 3 hdfs hadoop 97542246 2015-03-10 00:56 > /hdp/apps/2.2.2.0-2538/pig/pig.tar.gz > dr-xr-xr-x - hdfs hdfs 0 2015-03-09 18:15 > /hdp/apps/2.2.2.0-2538/tez > -r--r--r-- 3 hdfs hadoop 40656789 2015-03-09 18:15 > /hdp/apps/2.2.2.0-2538/tez/tez.tar.gz > > > Thanks, > > Alejandro Fernandez > >
