Hi, I'm afraid there is currently no api to define RangeParititoner in df.
// maropu On Tue, Jun 14, 2016 at 5:04 AM, Peter Halliday <pjh...@cornell.edu> wrote: > I have two questions > > First,I have a failure when I write parquet from Spark 1.6.1 on Amazon EMR > to S3. This is full batch, which is over 200GB of source data. The > partitioning is based on a geographic identifier we use, and also a date we > got the data. However, because of geographical density we certainly could > be hitting the fact we are getting tiles too dense. I’m trying to figure > out how to figure out the size of the file it’s trying to write out. > > Second, We use to use RDDs and RangePartitioner for task partitioning. > However, I don’t see this available in DataFrames. How does one achieve > this now. > > Peter Halliday > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- --- Takeshi Yamamuro