subject:"Spark \& S3 \- Introducing random values into key names"

Re: Spark & S3 - Introducing random values into key names

2018-03-08 Thread Subhash Sriram

Thanks, Vadim! That helps and makes sense. I don't think we have a number of keys so large that we have to worry about it. If we do, I think I would go with an approach similar to what you suggested. Thanks again, Subhash Sent from my iPhone > On Mar 8, 2018, at 11:56 AM, Vadim Semenov

Re: Spark & S3 - Introducing random values into key names

2018-03-08 Thread Vadim Semenov

You need to put randomness into the beginning of the key, if you put it other than into the beginning, it's not guaranteed that you're going to have good performance. The way we achieved this is by writing to HDFS first, and then having a custom DistCp implemented using Spark that copies parquet

Spark & S3 - Introducing random values into key names

2018-03-08 Thread Subhash Sriram

Hey Spark user community, I am writing Parquet files from Spark to S3 using S3a. I was reading this article about improving S3 bucket performance, specifically about how it can help to introduce randomness to your key names so that data is written to different partitions.

Re: Spark & S3 - Introducing random values into key names

Re: Spark & S3 - Introducing random values into key names

Spark & S3 - Introducing random values into key names

3 matches

Site Navigation

Mail list logo

Footer information