So doing a quick look through the README & code for spark-sftp it seems that the way this connector works is by downloading the file locally on the driver program and this is not configurable - so you would probably need to find a different connector (and you probably shouldn't use spark-sftp for large files). It also seems that it might not work in a cluster environment (which the projects README also warns about). You might have better luck using FUSE + sftp, although you will still want your remote gzip csv file to be split into multiple files since gzip isn't a splittable compression format.
On Wed, Mar 2, 2016 at 11:17 AM, Benjamin Kim <bbuil...@gmail.com> wrote: > I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV > file? I am able to download the file first locally using the SFTP Client in > the spark-sftp package. Then, I load the file into a dataframe using the > spark-csv package, which automatically decompresses the file. I just want > to remove the "downloading file to local" step and directly have the remote > file decompressed, read, and loaded. Can someone give me any hints? > > Thanks, > Ben > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau