> On March 10, 2015, 6:42 p.m., Andrew Onischuk wrote:
> > Alejandro, this will make our deploy longer much, since those fs commands 
> > take lot of time on loaded cluster.
> > 
> > Can we like set x retries with y delay for CopyFromLocal, in that case we 
> > won't increase time of successful deployment.

Meaning this will increase time of every CopyFromLocal x2 or so, and we have a 
lot of that things called, and they seems to take a lot of time even now.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31878/#review75927
-----------------------------------------------------------


On March 10, 2015, 1:41 a.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31878/
> -----------------------------------------------------------
> 
> (Updated March 10, 2015, 1:41 a.m.)
> 
> 
> Review request for Ambari, Andrew Onischuk, Jonathan Hurley, Nate Cole, and 
> Sid Wagle.
> 
> 
> Bugs: AMBARI-9990
>     https://issues.apache.org/jira/browse/AMBARI-9990
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Pig Service Check and Hive Server 2 START ran on 2 different machines during 
> the stack installation and failed to copy the tez tarball to HDFS.
> 
> I was able to reproduce this locally by calling CopyFromLocal from two 
> clients simultaneously. See the HDFS audit log, datanode logs on c6408 & 
> c6410, and namenode log on c6410.
> 
> The copyFromLocal command's behavior is:
> * Try to create a temporary file <filename>._COPYING_ and write the real data 
> there
> * If hit any exception, delete the file with the name <filename>._COPYING_
> 
> Thus we have the following race condition in this test:
> Process P1 created file "tez.tar.gz._COPYING_" and wrote data to it
> Process P2 fired the same copyFromLocal command and hit exception because it 
> could not get the lease
> P2 then deleted the file "tez.tar.gz._COPYING_"
> P1 could not close the file "tez.tar.gz._COPYING_" since it had been deleted 
> by P2. The exception would say "could not find lease for file..."
> In general we do not have the correct synchronization guarantee for the 
> "copyFromLocal" command.
> 
> One solution is for the destination file name to be unique. Because the mv 
> command is synchronized by the namenode, at least one of them will succeed in 
> naming the file.
> 
> 
> Diffs
> -----
> 
>   
> ambari-common/src/main/python/resource_management/libraries/functions/dynamic_variable_interpretation.py
>  00b8d70 
> 
> Diff: https://reviews.apache.org/r/31878/diff/
> 
> 
> Testing
> -------
> 
> Unit tests on builds.apache.org passed,
> https://builds.apache.org/job/Ambari-trunk-test-patch/1977/
> 
> I also deployed a cluster and verified that it was able to copy the tarballs 
> to HDFS when installing YARN, Hive, Pig.
> 
> [root@c6408 ~]# su - hdfs -c 'hadoop fs -ls -R /hdp/apps/2.2.2.0-2538/'
> dr-xr-xr-x   - hdfs hdfs          0 2015-03-10 00:55 
> /hdp/apps/2.2.2.0-2538/hive
> -r--r--r--   3 hdfs hadoop   82982575 2015-03-10 00:55 
> /hdp/apps/2.2.2.0-2538/hive/hive.tar.gz
> dr-xr-xr-x   - hdfs hdfs            0 2015-03-10 00:57 
> /hdp/apps/2.2.2.0-2538/mapreduce
> -r--r--r--   3 hdfs hadoop     105000 2015-03-10 00:57 
> /hdp/apps/2.2.2.0-2538/mapreduce/hadoop-streaming.jar
> -r--r--r--   3 hdfs hadoop  192699956 2015-03-09 18:15 
> /hdp/apps/2.2.2.0-2538/mapreduce/mapreduce.tar.gz
> dr-xr-xr-x   - hdfs hdfs            0 2015-03-10 00:56 
> /hdp/apps/2.2.2.0-2538/pig
> -r--r--r--   3 hdfs hadoop   97542246 2015-03-10 00:56 
> /hdp/apps/2.2.2.0-2538/pig/pig.tar.gz
> dr-xr-xr-x   - hdfs hdfs            0 2015-03-09 18:15 
> /hdp/apps/2.2.2.0-2538/tez
> -r--r--r--   3 hdfs hadoop   40656789 2015-03-09 18:15 
> /hdp/apps/2.2.2.0-2538/tez/tez.tar.gz
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>

Reply via email to