For security reasons I am required to conform and use a different S3 library that I am provided to access S3 data. If I write an adapter against the native file system store class to access S3 using my own library, do I still get the same benefits that I would get for using the default file system store , i.e. jets3t native file system store? My motivation here is to exploit hadoop's capability to compute and generate file splits , so that I can parallelize the work across different mappers for a single S3 file. I believe this is quite different from the norm, as splits are generally used in HDFS and supports larger files (where in this case the max is 5GB) and that most approaches that I've heard requires the uploading of the data from S3 to HDFS prior to processing - I am currently reading and writing straight to S3, similar to EMR. What I have just pointed out may be completely infeasible - I have looked through parts of the hadoop library but haven't completely grasped how file split could interact with S3 input stream. There are two questions here that may be totally unrelated, but thanks for reading.
Clarence