Sure. You can try pySpark, which is the Python API of Spark.
> On May 20, 2016, at 06:20, ayan guha wrote:
>
> Hi
>
> Thanks for the input. Can it be possible to write it in python? I think I can
> use FileUti.untar from hdfs jar. But can I do it from python?
>
> On 19
See http://memect.co/call-java-from-python-so
You can also use Py4J
On Thu, May 19, 2016 at 3:20 PM, ayan guha wrote:
> Hi
>
> Thanks for the input. Can it be possible to write it in python? I think I
> can use FileUti.untar from hdfs jar. But can I do it from python?
> On
Hi
Thanks for the input. Can it be possible to write it in python? I think I
can use FileUti.untar from hdfs jar. But can I do it from python?
On 19 May 2016 16:57, "Sun Rui" wrote:
> 1. create a temp dir on HDFS, say “/tmp”
> 2. write a script to create in the temp dir one
1. create a temp dir on HDFS, say “/tmp”
2. write a script to create in the temp dir one file for each tar file. Each
file has only one line:
3. Write a spark application. It is like:
val rdd = sc.textFile ()
rdd.map { line =>
construct an untar command using the path information in
Hi
I have few tar files in HDFS in a single folder. each file has multiple
files in it.
tar1:
- f1.txt
- f2.txt
tar2:
- f1.txt
- f2.txt
(each tar file will have exact same number of files, same name)
I am trying to find a way (spark or pig) to extract them to their own