For security reasons I am required to conform and use a different S3 library
that I am provided to access S3 data. If I write an adapter against the
native file system store class to access S3 using my own library, do I still
get the same benefits that I would get for using the default file system
store , i.e. jets3t native file system store? My motivation here is to
exploit hadoop's capability to compute and generate file splits , so that I
can parallelize the work across different mappers for a single S3 file. I
believe this is quite different from the norm, as splits are generally used
in HDFS and supports larger files (where in this case the max is 5GB) and
that most approaches that I've heard requires the uploading of the data from
S3 to HDFS prior to processing - I am currently reading and writing straight
to S3, similar to EMR. What I have just pointed out may be completely
infeasible - I have looked through parts of the hadoop library but haven't
completely grasped how file split could interact with S3 input stream. There
are two questions here that may be totally unrelated, but thanks for
reading.

Clarence

Reply via email to