Hi
I m currently using spark to process a file containing a million of
rows(edgar quarterly filings files)
Each row contains some infos plus a location of a remote file which I need
to retrieve using FTP and then process it's content.
I want to do all 3 operations ( process filing file, fetch remote files and
process them in ) in one go.
I want to avoid doing the first step (processing the million row file) in
spark and the rest (_fetching FTP and process files) offline.
Does spark has anything that can help with the FTP fetch?

Thanks in advance and rgds
Marco

Reply via email to