Hi I m currently using spark to process a file containing a million of rows(edgar quarterly filings files) Each row contains some infos plus a location of a remote file which I need to retrieve using FTP and then process it's content. I want to do all 3 operations ( process filing file, fetch remote files and process them in ) in one go. I want to avoid doing the first step (processing the million row file) in spark and the rest (_fetching FTP and process files) offline. Does spark has anything that can help with the FTP fetch?
Thanks in advance and rgds Marco