Re: SFTP Compressed CSV into Dataframe

Benjamin Kim Thu, 03 Mar 2016 10:38:45 -0800

Sumedh,

How would this work? The only server that we have is the Oozie server with no 
resources to run anything except Oozie, and we have no sudo permissions. If we 
run the mount command using the shell action which can run on any node of the 
cluster via YARN, then the spark job will not be able to see it because it 
could exist on any random unknown node. If we run the mount command using shell 
commands in spark, then could be possible that the mount will exist on the same 
node as the executor reading the file?


Thanks,
Ben 

> On Mar 3, 2016, at 10:29 AM, Sumedh Wale <sw...@snappydata.io> wrote:
> 
> (-user)
> 
> On Thursday 03 March 2016 10:09 PM, Benjamin Kim wrote:
>> I forgot to mention that we will be scheduling this job using Oozie. So, we 
>> will not be able to know which worker node is going to being running this. 
>> If we try to do anything local, it would get lost. This is why I’m looking 
>> for something that does not deal with the local file system.
> 
> Can't you mount using sshfs locally as part of the job at the start, then 
> unmount at the end? This is assuming that the platform being used is Linux.
> 
>>> On Mar 2, 2016, at 11:17 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>>> 
>>> I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV 
>>> file? I am able to download the file first locally using the SFTP Client in 
>>> the spark-sftp package. Then, I load the file into a dataframe using the 
>>> spark-csv package, which automatically decompresses the file. I just want 
>>> to remove the "downloading file to local" step and directly have the remote 
>>> file decompressed, read, and loaded. Can someone give me any hints?
>>> 
>>> Thanks,
>>> Ben
> 
> thanks
> 
> -- 
> Sumedh Wale
> SnappyData (http://www.snappydata.io)
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: SFTP Compressed CSV into Dataframe

Reply via email to