See http://memect.co/call-java-from-python-so
You can also use Py4J On Thu, May 19, 2016 at 3:20 PM, ayan guha <guha.a...@gmail.com> wrote: > Hi > > Thanks for the input. Can it be possible to write it in python? I think I > can use FileUti.untar from hdfs jar. But can I do it from python? > On 19 May 2016 16:57, "Sun Rui" <sunrise_...@163.com> wrote: > >> 1. create a temp dir on HDFS, say “/tmp” >> 2. write a script to create in the temp dir one file for each tar file. >> Each file has only one line: >> <absolute path of the tar file> >> 3. Write a spark application. It is like: >> val rdd = sc.textFile (<HDFS path of the temp dir>) >> rdd.map { line => >> construct an untar command using the path information in “line” >> and launches the command >> } >> >> > On May 19, 2016, at 14:42, ayan guha <guha.a...@gmail.com> wrote: >> > >> > Hi >> > >> > I have few tar files in HDFS in a single folder. each file has multiple >> files in it. >> > >> > tar1: >> > - f1.txt >> > - f2.txt >> > tar2: >> > - f1.txt >> > - f2.txt >> > >> > (each tar file will have exact same number of files, same name) >> > >> > I am trying to find a way (spark or pig) to extract them to their own >> folders. >> > >> > f1 >> > - tar1_f1.txt >> > - tar2_f1.txt >> > f2: >> > - tar1_f2.txt >> > - tar1_f2.txt >> > >> > Any help? >> > >> > >> > >> > -- >> > Best Regards, >> > Ayan Guha >> >> >>