[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400942#comment-15400942
 ] 

Xiao Li commented on SPARK-16275:
-

It sounds like both of you are fine to remove the Hive's hash UDF. Will submit 
a PR to resolve it. 

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400924#comment-15400924
 ] 

Wenchen Fan commented on SPARK-16275:
-

It's weird if we have 2 hash implementations, one for hive compatibility, one 
for internal usage(shuffle, bucket, etc.)  I'd like to update those values with 
our own hash function.

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400854#comment-15400854
 ] 

Xiao Li commented on SPARK-16275:
-

https://github.com/apache/hive/blob/15bdce43db4624a63be1f648e46d1f2baa1c67de/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638-L748

This is the hash function of Hive. The implementation sounds ok, but I might 
need to check it with [~cloud_fan]. Not all the data types (e.g. Union) are 
supported. It is highly related to the data types. I am not exactly sure 
whether we have the same value ranges for each data type. To make sure they 
always generate the same result. The test cases might be a lot. 

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400849#comment-15400849
 ] 

Xiao Li commented on SPARK-16275:
-

Let me check it. Thanks!

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400846#comment-15400846
 ] 

Reynold Xin commented on SPARK-16275:
-

How difficult would it be to provide a native hash implementation that
returns the same result?

If it is difficult, I'm fine with us updating all of those to the values
returned by our own native hash function.




> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400844#comment-15400844
 ] 

Xiao Li commented on SPARK-16275:
-

Yeah, many queries are using it. Below is the list:

auto_join_nulls
auto_join0
auto_join1
auto_join2
auto_join3
auto_join4
auto_join5
auto_join6
auto_join7
auto_join8
auto_join9
auto_join10
auto_join11
auto_join12
auto_join13
auto_join14
auto_join14_hadoop20
auto_join15
auto_join17
auto_join18
auto_join19
auto_join20
auto_join22
auto_join25
auto_join30
auto_join31
correlationoptimizer1
correlationoptimizer2
correlationoptimizer3
correlationoptimizer4
multiMapJoin1
orc_dictionary_threshold
udf_hash

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400783#comment-15400783
 ] 

Reynold Xin commented on SPARK-16275:
-

What do we use Hive's hash function for? Are there queries in the Hive 
compatibility suite that is using it?


> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400756#comment-15400756
 ] 

Xiao Li commented on SPARK-16275:
-

[~rxin] What is the plan for {{hash}}? If we use our version, it breaks a lot 
of test cases in {{HiveCompatibilitySuite}}. For resolving the failure of test 
cases, we can migrate them into a separate test suite based on our hash 
function. This is just a labor job. Do you think this is OK?

Thanks!

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-11 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371199#comment-15371199
 ] 

Reynold Xin commented on SPARK-16275:
-

Note that some of these might result in slightly different behavioral changes 
and as a result if possible, it'd be great for these to be in 2.0 rather than 
2.1, so we don't break compatibility. The good thing is that these functions 
are all very isolated and as a result doesn't impact rest of the code base.


> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-06-29 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354617#comment-15354617
 ] 

Dongjoon Hyun commented on SPARK-16275:
---

Sure. It's my pleasure!

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-06-29 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354614#comment-15354614
 ] 

Reynold Xin commented on SPARK-16275:
-

cc [~dongjoon] maybe you can help with some of these?


> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org