[ https://issues.apache.org/jira/browse/SPARK-31162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060408#comment-17060408 ]
Felix Kizhakkel Jose commented on SPARK-31162: ---------------------------------------------- I have seen following in the API documentation: /** * Buckets the output by the given columns. *If specified, the output is laid out on the file* ** system similar to Hive's bucketing scheme.* * * This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark * 2.1.0. * * @since 2.0 */ @scala.annotation.varargs def bucketBy(numBuckets: Int, colName: String, colNames: String*): DataFrameWriter[T] = { this.numBuckets = Option(numBuckets) this.bucketColumnNames = Option(colName +: colNames) this } How can we specify that? > Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing > ----------------------------------------------------------------------------- > > Key: SPARK-31162 > URL: https://issues.apache.org/jira/browse/SPARK-31162 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL > Affects Versions: 2.4.5 > Reporter: Felix Kizhakkel Jose > Priority: Major > > I couldn't find a configuration parameter to choose Hive Hashing instead of > Spark's default Murmur Hash when performing Spark BucketBy operation. > According to the discussion with @[~maropu] [~hyukjin.kwon], suggested to > open a new JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org