Re: dataframes and numPartitions

2015-10-18 Thread Jorge Sánchez
ty useful. The default > value is 200. > > > > Mohammed > > > > *From:* Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] > *Sent:* Wednesday, October 14, 2015 8:14 PM > *To:* user > *Subject:* dataframes and numPartitions > > > > A lot of RDD met

RE: dataframes and numPartitions

2015-10-15 Thread Mohammed Guller
You may find the spark.sql.shuffle.partitions property useful. The default value is 200. Mohammed From: Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] Sent: Wednesday, October 14, 2015 8:14 PM To: user Subject: dataframes and numPartitions A lot of RDD methods take a numPartitions

dataframes and numPartitions

2015-10-14 Thread Alex Nastetsky
A lot of RDD methods take a numPartitions parameter that lets you specify the number of partitions in the result. For example, groupByKey. The DataFrame counterparts don't have a numPartitions parameter, e.g. groupBy only takes a bunch of Columns as params. I understand that the DataFrame API is