subject:"Dataset doesn't have partitioner after a repartition on one of the columns"

Re: Dataset doesn't have partitioner after a repartition on one of the columns

2016-09-28 Thread Igor Berman

Michael, can you explain please why bucketBy is supported when using writeAsTable() to parquet by not with parquet() Is it only difference between table api and dataframe/dataset api? or there are some other? org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now; at

Re: Dataset doesn't have partitioner after a repartition on one of the columns

2016-09-28 Thread Michael Armbrust

Hi Darin, In SQL we have finer grained information about partitioning, so we don't use the RDD Partitioner. Here's a notebook that walks

Dataset doesn't have partitioner after a repartition on one of the columns

2016-09-20 Thread McBeath, Darin W (ELS-STL)

I'm using Spark 2.0. I've created a dataset from a parquet file and repartition on one of the columns (docId) and persist the repartitioned dataset. val om = ds.repartition($"docId").persist(StorageLevel.MEMORY_AND_DISK) When I try to confirm the partitioner, with om.rdd.partitioner I get