[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400942#comment-15400942 ] Xiao Li commented on SPARK-16275: - It sounds like both of you are fine to remove the Hive's hash UDF. Will submit a PR to resolve it. > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400924#comment-15400924 ] Wenchen Fan commented on SPARK-16275: - It's weird if we have 2 hash implementations, one for hive compatibility, one for internal usage(shuffle, bucket, etc.) I'd like to update those values with our own hash function. > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400854#comment-15400854 ] Xiao Li commented on SPARK-16275: - https://github.com/apache/hive/blob/15bdce43db4624a63be1f648e46d1f2baa1c67de/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638-L748 This is the hash function of Hive. The implementation sounds ok, but I might need to check it with [~cloud_fan]. Not all the data types (e.g. Union) are supported. It is highly related to the data types. I am not exactly sure whether we have the same value ranges for each data type. To make sure they always generate the same result. The test cases might be a lot. > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400849#comment-15400849 ] Xiao Li commented on SPARK-16275: - Let me check it. Thanks! > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400846#comment-15400846 ] Reynold Xin commented on SPARK-16275: - How difficult would it be to provide a native hash implementation that returns the same result? If it is difficult, I'm fine with us updating all of those to the values returned by our own native hash function. > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400844#comment-15400844 ] Xiao Li commented on SPARK-16275: - Yeah, many queries are using it. Below is the list: auto_join_nulls auto_join0 auto_join1 auto_join2 auto_join3 auto_join4 auto_join5 auto_join6 auto_join7 auto_join8 auto_join9 auto_join10 auto_join11 auto_join12 auto_join13 auto_join14 auto_join14_hadoop20 auto_join15 auto_join17 auto_join18 auto_join19 auto_join20 auto_join22 auto_join25 auto_join30 auto_join31 correlationoptimizer1 correlationoptimizer2 correlationoptimizer3 correlationoptimizer4 multiMapJoin1 orc_dictionary_threshold udf_hash > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400783#comment-15400783 ] Reynold Xin commented on SPARK-16275: - What do we use Hive's hash function for? Are there queries in the Hive compatibility suite that is using it? > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400756#comment-15400756 ] Xiao Li commented on SPARK-16275: - [~rxin] What is the plan for {{hash}}? If we use our version, it breaks a lot of test cases in {{HiveCompatibilitySuite}}. For resolving the failure of test cases, we can migrate them into a separate test suite based on our hash function. This is just a labor job. Do you think this is OK? Thanks! > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371199#comment-15371199 ] Reynold Xin commented on SPARK-16275: - Note that some of these might result in slightly different behavioral changes and as a result if possible, it'd be great for these to be in 2.0 rather than 2.1, so we don't break compatibility. The good thing is that these functions are all very isolated and as a result doesn't impact rest of the code base. > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354617#comment-15354617 ] Dongjoon Hyun commented on SPARK-16275: --- Sure. It's my pleasure! > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions
[ https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354614#comment-15354614 ] Reynold Xin commented on SPARK-16275: - cc [~dongjoon] maybe you can help with some of these? > Implement all the Hive fallback functions > - > > Key: SPARK-16275 > URL: https://issues.apache.org/jira/browse/SPARK-16275 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Reynold Xin > > As of Spark 2.0, Spark falls back to Hive for only the following built-in > functions: > {code} > "elt", "hash", "java_method", "histogram_numeric", > "map_keys", "map_values", > "parse_url", "percentile", "percentile_approx", "reflect", "sentences", > "stack", "str_to_map", > "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", > "xpath_long", > "xpath_number", "xpath_short", "xpath_string", > // table generating function > "inline", "posexplode" > {code} > The goal of the ticket is to implement all of these in Spark so we don't need > to fall back into Hive's UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org