Anyway to make RDD preserve input directories structures?

2015-01-15 Thread 逸君曹
say there's some logs: s3://log-collections/sys1/20141212/nginx.gz s3://log-collections/sys1/20141213/nginx-part-1.gz s3://log-collections/sys1/20141213/nginx-part-2.gz I have a function that parse the logs for later analysis. I want to parse all the files. So I do this: logs =

Re: Anyway to make RDD preserve input directories structures?

2015-01-15 Thread Sean Owen
Maybe you are saying you already do this, but it's perfectly possible to process as many RDDs as you like in parallel on the driver. That may allow your current approach to eat up as much parallelism as you like. I'm not sure if that's what you are describing with submit multi applications but you