Hi, Lets say I have millions of binary format files... Lets say I have this java (or python) library which reads and parses these binary formatted files..
Say import foo f = foo.open(filename) header = f.get_header() and some other methods.. What I was thinking was to write hadoop input format to read and parse these files.. but since I am newbie in spark.. I was wondering if I can directly open these files and use these existing libraries to process the files..I dont necessary have requirement to save these files in hdfs.. even nfs will work. Is there a way I can use spark without having to write the parser again? Thanks