say there's some logs:
s3://log-collections/sys1/20141212/nginx.gz
s3://log-collections/sys1/20141213/nginx-part-1.gz
s3://log-collections/sys1/20141213/nginx-part-2.gz
I have a function that parse the logs for later analysis.
I want to parse all the files. So I do this:
logs =
Maybe you are saying you already do this, but it's perfectly possible
to process as many RDDs as you like in parallel on the driver. That
may allow your current approach to eat up as much parallelism as you
like. I'm not sure if that's what you are describing with submit
multi applications but you