[jira] [Commented] (SPARK-31162) Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing

Felix Kizhakkel Jose (Jira) Mon, 16 Mar 2020 11:14:22 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-31162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060408#comment-17060408
 ]


Felix Kizhakkel Jose commented on SPARK-31162:
----------------------------------------------

I have seen following in the API documentation:



/**
 * Buckets the output by the given columns. *If specified, the output is laid 
out on the file*
 ** system similar to Hive's bucketing scheme.*
 *
 * This is applicable for all file-based data sources (e.g. Parquet, JSON) 
starting with Spark
 * 2.1.0.
 *
 * @since 2.0
 */
@scala.annotation.varargs
def bucketBy(numBuckets: Int, colName: String, colNames: String*): 
DataFrameWriter[T] = {
 this.numBuckets = Option(numBuckets)
 this.bucketColumnNames = Option(colName +: colNames)
 this
}

How can we specify that?

 

> Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-31162
>                 URL: https://issues.apache.org/jira/browse/SPARK-31162
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core, SQL
>    Affects Versions: 2.4.5
>            Reporter: Felix Kizhakkel Jose
>            Priority: Major
>
> I couldn't find a configuration parameter to choose Hive Hashing instead of 
> Spark's default Murmur Hash when performing Spark BucketBy operation. 
> According to the discussion with @[~maropu] [~hyukjin.kwon], suggested to 
> open a new JIRA. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31162) Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing

Reply via email to