[jira] [Commented] (SPARK-8255) string function: regexp_extract

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579927#comment-14579927 ] Cheng Hao commented on SPARK-8255: -- I'll take this one. string function: regexp_extract

[jira] [Commented] (SPARK-8262) string function: split

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579935#comment-14579935 ] Cheng Hao commented on SPARK-8262: -- I'll take this one. string function: split

[jira] [Commented] (SPARK-8270) string function: levenshtein

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579941#comment-14579941 ] Cheng Hao commented on SPARK-8270: -- I'll take this one. string function: levenshtein

[jira] [Commented] (SPARK-8266) string function: translate

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579938#comment-14579938 ] Cheng Hao commented on SPARK-8266: -- I'll take this one. string function: translate

[jira] [Commented] (SPARK-8229) conditional function: isnotnull

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579905#comment-14579905 ] Cheng Hao commented on SPARK-8229: -- I'll take this one. conditional function: isnotnull

[jira] [Commented] (SPARK-8240) string function: concat

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579912#comment-14579912 ] Cheng Hao commented on SPARK-8240: -- I'll take this one. string function: concat

[jira] [Commented] (SPARK-8231) complex function: array_contains

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579908#comment-14579908 ] Cheng Hao commented on SPARK-8231: -- I'll take this one. complex function

[jira] [Commented] (SPARK-8238) string function: ascii

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579910#comment-14579910 ] Cheng Hao commented on SPARK-8238: -- I'll take this one. string function: ascii

[jira] [Commented] (SPARK-8230) complex function: size

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579906#comment-14579906 ] Cheng Hao commented on SPARK-8230: -- I'll take this one. complex function: size

[jira] [Commented] (SPARK-8239) string function: base64

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579911#comment-14579911 ] Cheng Hao commented on SPARK-8239: -- I'll take this one. string function: base64

[jira] [Commented] (SPARK-8232) complex function: sort_array

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579909#comment-14579909 ] Cheng Hao commented on SPARK-8232: -- I'll take this one. complex function: sort_array

[jira] [Commented] (SPARK-8243) string function: encode

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579915#comment-14579915 ] Cheng Hao commented on SPARK-8243: -- I'll take this one. string function: encode

[jira] [Commented] (SPARK-8245) string function: format_number

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579917#comment-14579917 ] Cheng Hao commented on SPARK-8245: -- I'll take this one. string function: format_number

[jira] [Commented] (SPARK-8241) string function: concat_ws

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579913#comment-14579913 ] Cheng Hao commented on SPARK-8241: -- I'll take this one. string function: concat_ws

[jira] [Commented] (SPARK-8249) string function: locate

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579920#comment-14579920 ] Cheng Hao commented on SPARK-8249: -- I'll take this one. string function: locate

[jira] [Commented] (SPARK-8250) string function: alias lower/lcase

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579921#comment-14579921 ] Cheng Hao commented on SPARK-8250: -- I'll take this one. string function: alias lower

[jira] [Commented] (SPARK-8247) string function: instr

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579919#comment-14579919 ] Cheng Hao commented on SPARK-8247: -- I'll take this one. string function: instr

[jira] [Commented] (SPARK-8254) string function: printf

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579926#comment-14579926 ] Cheng Hao commented on SPARK-8254: -- I'll take this one. string function: printf

[jira] [Commented] (SPARK-8252) string function: lpad

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579923#comment-14579923 ] Cheng Hao commented on SPARK-8252: -- I'll take this one. string function: lpad

[jira] [Commented] (SPARK-8253) string function: ltrim

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579925#comment-14579925 ] Cheng Hao commented on SPARK-8253: -- I'll take this one. string function: ltrim

[jira] [Commented] (SPARK-8271) string function: soundex

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579942#comment-14579942 ] Cheng Hao commented on SPARK-8271: -- I'll take this one. string function: soundex

[jira] [Commented] (SPARK-8258) string function: reverse

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579930#comment-14579930 ] Cheng Hao commented on SPARK-8258: -- I'll take this one. string function: reverse

[jira] [Commented] (SPARK-8263) string function: substr/substring should also support binary type

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579936#comment-14579936 ] Cheng Hao commented on SPARK-8263: -- I'll take this one. string function: substr

[jira] [Commented] (SPARK-8257) string function: repeat

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579929#comment-14579929 ] Cheng Hao commented on SPARK-8257: -- I'll take this one. string function: repeat

[jira] [Commented] (SPARK-8260) string function: rtrim

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579932#comment-14579932 ] Cheng Hao commented on SPARK-8260: -- I'll take this one. string function: rtrim

[jira] [Commented] (SPARK-8256) string function: regexp_replace

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579928#comment-14579928 ] Cheng Hao commented on SPARK-8256: -- I'll take this one. string function: regexp_replace

[jira] [Commented] (SPARK-8264) string function: substring_index

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579937#comment-14579937 ] Cheng Hao commented on SPARK-8264: -- I'll take this one. string function

[jira] [Commented] (SPARK-8268) string function: unbase64

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579939#comment-14579939 ] Cheng Hao commented on SPARK-8268: -- I'll take this one. string function: unbase64

[jira] [Commented] (SPARK-8269) string function: initcap

2015-06-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579940#comment-14579940 ] Cheng Hao commented on SPARK-8269: -- I'll take this one. string function: initcap

RE: [SparkSQL ] What is Exchange in physical plan for ?

2015-06-08 Thread Cheng, Hao
It means the data shuffling, and its arguments also show the partitioning strategy. -Original Message- From: invkrh [mailto:inv...@gmail.com] Sent: Monday, June 8, 2015 9:34 PM To: dev@spark.apache.org Subject: [SparkSQL ] What is Exchange in physical plan for ? Hi,

RE: SparkSQL : using Hive UDF returning Map throws rror: scala.MatchError: interface java.util.Map (of class java.lang.Class) (state=,code=0)

2015-06-05 Thread Cheng, Hao
Confirmed, with latest master, we don't support complex data type for Simple Hive UDF, do you mind file an issue in jira? -Original Message- From: Cheng, Hao [mailto:hao.ch...@intel.com] Sent: Friday, June 5, 2015 12:35 PM To: ogoh; user@spark.apache.org Subject: RE: SparkSQL : using

[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed

2015-06-04 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573952#comment-14573952 ] Cheng Hao commented on SPARK-8071: -- I couldn't reproduce that with scala API, and also

[jira] [Commented] (SPARK-8071) Run PySpark dataframe.rollup/cube test failed

2015-06-04 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573935#comment-14573935 ] Cheng Hao commented on SPARK-8071: -- Can you try `df.cube('name', 'age').count().show

RE: SparkSQL : using Hive UDF returning Map throws rror: scala.MatchError: interface java.util.Map (of class java.lang.Class) (state=,code=0)

2015-06-04 Thread Cheng, Hao
Which version of Hive jar are you using? Hive 0.13.1 or Hive 0.12.0? -Original Message- From: ogoh [mailto:oke...@gmail.com] Sent: Friday, June 5, 2015 10:10 AM To: user@spark.apache.org Subject: SparkSQL : using Hive UDF returning Map throws rror: scala.MatchError: interface

[jira] [Created] (SPARK-7915) Support specifying the column list for target table in CTAS

2015-05-28 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7915: Summary: Support specifying the column list for target table in CTAS Key: SPARK-7915 URL: https://issues.apache.org/jira/browse/SPARK-7915 Project: Spark Issue

RE: Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

2015-05-27 Thread Cheng, Hao
Yes, but be sure you put the hive-site.xml under your class path. Any problem you meet? Cheng Hao From: Sanjay Subramanian [mailto:sanjaysubraman...@yahoo.com.INVALID] Sent: Thursday, May 28, 2015 8:53 AM To: user Subject: Pointing SparkSQL to existing Hive Metadata with data file locations

[jira] [Commented] (SPARK-7550) Support setting the right schema serde when writing to Hive metastore

2015-05-26 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560450#comment-14560450 ] Cheng Hao commented on SPARK-7550: -- Similar issue with SPARK-6923 Support setting

[jira] [Comment Edited] (SPARK-7550) Support setting the right schema serde when writing to Hive metastore

2015-05-26 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560450#comment-14560450 ] Cheng Hao edited comment on SPARK-7550 at 5/27/15 5:48 AM

[jira] [Created] (SPARK-7871) Improve the outputPartitioning for HashOuterJoin(full outer join)

2015-05-26 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7871: Summary: Improve the outputPartitioning for HashOuterJoin(full outer join) Key: SPARK-7871 URL: https://issues.apache.org/jira/browse/SPARK-7871 Project: Spark

RE: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-25 Thread Cheng, Hao
Add another Blocker issue, just created! It seems a regression. https://issues.apache.org/jira/browse/SPARK-7853 -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, May 25, 2015 3:37 PM To: Patrick Wendell Cc: dev@spark.apache.org Subject: Re: [VOTE] Release

[jira] [Created] (SPARK-7853) ClassNotFoundException for SparkSQL

2015-05-25 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7853: Summary: ClassNotFoundException for SparkSQL Key: SPARK-7853 URL: https://issues.apache.org/jira/browse/SPARK-7853 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-7853) ClassNotFoundException for SparkSQL

2015-05-25 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558321#comment-14558321 ] Cheng Hao commented on SPARK-7853: -- ClassNotFound is actually I got after investigation

[jira] [Commented] (SPARK-7853) ClassNotFoundException for SparkSQL

2015-05-25 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558332#comment-14558332 ] Cheng Hao commented on SPARK-7853: -- And it seems the bug introduced

[jira] [Commented] (SPARK-7853) ClassNotFoundException for SparkSQL

2015-05-25 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558363#comment-14558363 ] Cheng Hao commented on SPARK-7853: -- Update the description, seesm 'TestSerDe

[jira] [Updated] (SPARK-7853) ClassNotFoundException for SparkSQL

2015-05-25 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-7853: - Description: Reproduce steps: {code} bin/spark-sql --jars ./sql/hive/src/test/resources/hive-hcatalog

[jira] [Commented] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-25 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558697#comment-14558697 ] Cheng Hao commented on SPARK-7727: -- As we probably don't want to change

[jira] [Created] (SPARK-7859) Collect_SET behaves different under different version of JDK

2015-05-25 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7859: Summary: Collect_SET behaves different under different version of JDK Key: SPARK-7859 URL: https://issues.apache.org/jira/browse/SPARK-7859 Project: Spark Issue

[jira] [Updated] (SPARK-7859) Collect_SET behaves different under different version of JDK

2015-05-25 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-7859: - Description: To reproduce {code} JAVA_HOME=/home/hcheng/Java/jdk1.8.0_45 | build/sbt -Phadoop-2.3 -Phive

RE: SparkSQL errors in 1.4 rc when using with Hive 0.12 metastore

2015-05-24 Thread Cheng, Hao
Thanks for reporting this. We intend to support the multiple metastore versions in a single build(hive-0.13.1) by introducing the IsolatedClientLoader, but probably you’re hitting the bug, please file a jira issue for this. I will keep investigating on this also. Hao From: Mark Hamstra

RE: SparkSQL errors in 1.4 rc when using with Hive 0.12 metastore

2015-05-24 Thread Cheng, Hao
Thanks for reporting this. We intend to support the multiple metastore versions in a single build(hive-0.13.1) by introducing the IsolatedClientLoader, but probably you’re hitting the bug, please file a jira issue for this. I will keep investigating on this also. Hao From: Mark Hamstra

[jira] [Commented] (SPARK-4233) Simplify the Aggregation Function implementation

2015-05-22 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557060#comment-14557060 ] Cheng Hao commented on SPARK-4233: -- The interface changes is for scalability

[jira] [Commented] (SPARK-7320) Add rollup and cube support to DataFrame DSL

2015-05-19 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549916#comment-14549916 ] Cheng Hao commented on SPARK-7320: -- Sorry, it's not depends on SPARK-7235 any more. Add

[jira] [Updated] (SPARK-7268) [Spark SQL] Throw 'Shutdown hooks cannot be modified during shutdown' on YARN

2015-05-18 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-7268: - Target Version/s: 1.4.0 [Spark SQL] Throw 'Shutdown hooks cannot be modified during shutdown' on YARN

RE: InferredSchema Example in Spark-SQL

2015-05-17 Thread Cheng, Hao
Forgot to import the implicit functions/classes? import sqlContext.implicits._ From: Rajdeep Dua [mailto:rajdeep@gmail.com] Sent: Monday, May 18, 2015 8:08 AM To: user@spark.apache.org Subject: InferredSchema Example in Spark-SQL Hi All, Was trying the Inferred Schema spart example

RE: InferredSchema Example in Spark-SQL

2015-05-17 Thread Cheng, Hao
Typo? Should be .toDF(), not .toRD() From: Ram Sriharsha [mailto:sriharsha@gmail.com] Sent: Monday, May 18, 2015 8:31 AM To: Rajdeep Dua Cc: user Subject: Re: InferredSchema Example in Spark-SQL you mean toDF() ? (toDF converts the RDD to a DataFrame, in this case inferring schema from the

RE: What's the advantage features of Spark SQL(JDBC)

2015-05-15 Thread Cheng, Hao
Spark SQL just take the JDBC as a new data source, the same as we need to support loading data from a .csv or .json. From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID] Sent: Friday, May 15, 2015 2:30 PM To: User Subject: What's the advantage features of Spark SQL(JDBC) Hi All, Comparing

RE: Does Spark SQL (JDBC) support nest select with current version

2015-05-15 Thread Cheng, Hao
Spark SQL just load the query result as a new source (via JDBC), so DO NOT confused with the Spark SQL tables. They are totally independent database systems. From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID] Sent: Friday, May 15, 2015 1:59 PM To: Cheng, Hao; Dev Subject: Re: Does Spark SQL

RE: question about sparksql caching

2015-05-15 Thread Cheng, Hao
You probably can try something like: val df = sqlContext.sql(select c1, sum(c2) from T1, T2 where T1.key=T2.key group by c1) df.cache() // Cache the result, but it's a lazy execution. df.registerAsTempTable(my_result) sqlContext.sql(select * from my_result where c1=1).collect // the cache

[jira] [Created] (SPARK-7662) Exception of multi-attribute generator anlysis in projection

2015-05-15 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7662: Summary: Exception of multi-attribute generator anlysis in projection Key: SPARK-7662 URL: https://issues.apache.org/jira/browse/SPARK-7662 Project: Spark Issue

RE: What's the advantage features of Spark SQL(JDBC)

2015-05-15 Thread Cheng, Hao
Yes. From: Yi Zhang [mailto:zhangy...@yahoo.com] Sent: Friday, May 15, 2015 2:51 PM To: Cheng, Hao; User Subject: Re: What's the advantage features of Spark SQL(JDBC) @Hao, As you said, there is no advantage feature for JDBC, it just provides unified api to support different data sources

RE: Does Spark SQL (JDBC) support nest select with current version

2015-05-14 Thread Cheng, Hao
You need to register the “dataFrame” as a table first and then do queries on it? Do you mean that also failed? From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID] Sent: Friday, May 15, 2015 1:10 PM To: Yi Zhang; Dev Subject: Re: Does Spark SQL (JDBC) support nest select with current version If

[jira] [Commented] (FLINK-1990) Uppercase AS keyword not allowed in select expression

2015-05-11 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537997#comment-14537997 ] Cheng Hao commented on FLINK-1990: -- What if the aggregate functions? e.g. sum / SUM

[jira] [Comment Edited] (FLINK-1990) Uppercase AS keyword not allowed in select expression

2015-05-11 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537997#comment-14537997 ] Cheng Hao edited comment on FLINK-1990 at 5/11/15 2:50 PM: --- What

[jira] [Commented] (SPARK-7320) Add rollup and cube support to DataFrame DSL

2015-05-10 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537565#comment-14537565 ] Cheng Hao commented on SPARK-7320: -- The code is ready, but need to be reviewed. See

[jira] [Commented] (SPARK-7320) Add rollup and cube support to DataFrame DSL

2015-05-10 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537498#comment-14537498 ] Cheng Hao commented on SPARK-7320: -- After investigation, we need to refactor

RE: 回复:Re: sparksql running slow while joining_2_tables.

2015-05-05 Thread Cheng, Hao
, Hao; Wang, Daoyuan; Olivier Girardot; user Subject: 回复:Re: sparksql running slow while joining_2_tables. Hi guys, attache the pic of physical plan and logs.Thanks. Thanksamp;Best regards! 罗辉 San.Luo - 原始邮件 - 发件人:Cheng, Hao hao.ch

RE: 回复:Re: sparksql running slow while joining 2 tables.

2015-05-04 Thread Cheng, Hao
Can you print out the physical plan? EXPLAIN SELECT xxx… From: luohui20...@sina.com [mailto:luohui20...@sina.com] Sent: Monday, May 4, 2015 9:08 PM To: Olivier Girardot; user Subject: 回复:Re: sparksql running slow while joining 2 tables. hi Olivier spark1.3.1, with java1.8.0.45 and add 2 pics

Re: sparksql running slow while joining_2_tables.

2015-05-04 Thread Cheng, Hao
I assume you’re using the DataFrame API within your application. sql(“SELECT…”).explain(true) From: Wang, Daoyuan Sent: Tuesday, May 5, 2015 10:16 AM To: luohui20...@sina.com; Cheng, Hao; Olivier Girardot; user Subject: RE: 回复:RE: 回复:Re: sparksql running slow while joining_2_tables. You can use

RE: 回复:Re: sparksql running slow while joining 2 tables.

2015-05-04 Thread Cheng, Hao
Or, have you ever try broadcast join? From: Cheng, Hao [mailto:hao.ch...@intel.com] Sent: Tuesday, May 5, 2015 8:33 AM To: luohui20...@sina.com; Olivier Girardot; user Subject: RE: 回复:Re: sparksql running slow while joining 2 tables. Can you print out the physical plan? EXPLAIN SELECT xxx

[jira] [Commented] (SPARK-4629) Spark SQL uses Hadoop Configuration in a thread-unsafe manner when writing Parquet files

2015-04-30 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521153#comment-14521153 ] Cheng Hao commented on SPARK-4629: -- Are they relate issues? Spark SQL uses Hadoop

[jira] [Comment Edited] (SPARK-4629) Spark SQL uses Hadoop Configuration in a thread-unsafe manner when writing Parquet files

2015-04-30 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521153#comment-14521153 ] Cheng Hao edited comment on SPARK-4629 at 4/30/15 8:54 AM

[jira] [Created] (SPARK-7229) SpecificMutableRow should take integer type as internal representation for DateType

2015-04-29 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7229: Summary: SpecificMutableRow should take integer type as internal representation for DateType Key: SPARK-7229 URL: https://issues.apache.org/jira/browse/SPARK-7229 Project

[jira] [Resolved] (HIVE-10532) SpecificMutableRow doesn't handle Date Type correctly

2015-04-29 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao resolved HIVE-10532. -- Resolution: Invalid Oops. SpecificMutableRow doesn't handle Date Type correctly

[jira] [Created] (HIVE-10532) SpecificMutableRow doesn't handle Date Type correctly

2015-04-29 Thread Cheng Hao (JIRA)
Cheng Hao created HIVE-10532: Summary: SpecificMutableRow doesn't handle Date Type correctly Key: HIVE-10532 URL: https://issues.apache.org/jira/browse/HIVE-10532 Project: Hive Issue Type: Bug

[jira] [Created] (SPARK-7235) Refactor the GroupingSet implementation

2015-04-29 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7235: Summary: Refactor the GroupingSet implementation Key: SPARK-7235 URL: https://issues.apache.org/jira/browse/SPARK-7235 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-7269) Incorrect aggregation analysis

2015-04-29 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7269: Summary: Incorrect aggregation analysis Key: SPARK-7269 URL: https://issues.apache.org/jira/browse/SPARK-7269 Project: Spark Issue Type: Bug Components

[jira] [Commented] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-27 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516226#comment-14516226 ] Cheng Hao commented on SPARK-6923: -- [~pin_zhang], I agree [~marmbrus], you're hitting

[jira] [Commented] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-27 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516410#comment-14516410 ] Cheng Hao commented on SPARK-6923: -- Sorry, after investigating, it probably not a bug

[jira] [Created] (SPARK-7119) ScriptTransform doesn't consider the output data type

2015-04-24 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7119: Summary: ScriptTransform doesn't consider the output data type Key: SPARK-7119 URL: https://issues.apache.org/jira/browse/SPARK-7119 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-7044) [Spark SQL] query would hang when using scripts in SQL statement

2015-04-23 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-7044: - Fix Version/s: 1.3.0 [Spark SQL] query would hang when using scripts in SQL statement

[jira] [Updated] (SPARK-7044) [Spark SQL] query would hang when using scripts in SQL statement

2015-04-23 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-7044: - Affects Version/s: 1.3.0 [Spark SQL] query would hang when using scripts in SQL statement

[jira] [Reopened] (SPARK-7044) [Spark SQL] query would hang when using scripts in SQL statement

2015-04-23 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao reopened SPARK-7044: -- backport to Spark1.3 [Spark SQL] query would hang when using scripts in SQL statement

RE: Re: problem with spark thrift server

2015-04-23 Thread Cheng, Hao
Hi, can you describe a little bit how the ThriftServer crashed, or steps to reproduce that? It’s probably a bug of ThriftServer. Thanks, From: guoqing0...@yahoo.com.hk [mailto:guoqing0...@yahoo.com.hk] Sent: Friday, April 24, 2015 9:55 AM To: Arush Kharbanda Cc: user Subject: Re: Re: problem

[jira] [Resolved] (SPARK-7044) [Spark SQL] query would hang when using scripts in SQL statement

2015-04-23 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao resolved SPARK-7044. -- Resolution: Fixed [Spark SQL] query would hang when using scripts in SQL statement

[jira] [Updated] (SPARK-7051) Support Compression write for Parquet

2015-04-22 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-7051: - Description: The compression option doesn't take effect for parquet while writing data

[jira] [Created] (SPARK-7051) Support Compression write for Parquet

2015-04-22 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-7051: Summary: Support Compression write for Parquet Key: SPARK-7051 URL: https://issues.apache.org/jira/browse/SPARK-7051 Project: Spark Issue Type: Bug

[jira] [Closed] (SPARK-4967) File name with comma will cause exception for SQLContext.parquetFile

2015-04-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao closed SPARK-4967. Resolution: Won't Fix File name with comma will cause exception for SQLContext.parquetFile

[jira] [Commented] (SPARK-4967) File name with comma will cause exception for SQLContext.parquetFile

2015-04-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496645#comment-14496645 ] Cheng Hao commented on SPARK-4967: -- Thanks for explanation, I will close this issue

[jira] [Commented] (SPARK-6545) Minor changes for CompactBuffer

2015-04-12 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491914#comment-14491914 ] Cheng Hao commented on SPARK-6545: -- Thank you [~srowen], we should close this for now, I

RE: Spark Avarage

2015-04-06 Thread Cheng, Hao
The Dataframe API should be perfectly helpful in this case. https://spark.apache.org/docs/1.3.0/sql-programming-guide.html Some code snippet will like: val sqlContext = new org.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame. import

[jira] [Created] (SPARK-6734) Support GenericUDTF.close for Generate

2015-04-06 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-6734: Summary: Support GenericUDTF.close for Generate Key: SPARK-6734 URL: https://issues.apache.org/jira/browse/SPARK-6734 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-5941) Unit Test loads the table `src` twice for leftsemijoin.q

2015-04-05 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5941: - Summary: Unit Test loads the table `src` twice for leftsemijoin.q (was: `def table` is not using

[jira] [Issue Comment Deleted] (SPARK-5941) Unit Test loads the table `src` twice for leftsemijoin.q

2015-04-05 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5941: - Comment: was deleted (was: Eagerly resolving the table probably causes side effect in some scenarios

[jira] [Updated] (SPARK-5941) Unit Test loads the table `src` twice for leftsemijoin.q

2015-04-05 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5941: - Description: In leftsemijoin.q, there is a data loading command for table sales already, but in TestHive

[jira] [Updated] (SPARK-5941) Unit Test loads the table `src` twice for leftsemijoin.q

2015-04-05 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5941: - Description: In leftsemijoin.q, there is a data loading command for table sales already, but in TestHive

RE: Spark SQL. Memory consumption

2015-04-02 Thread Cheng, Hao
, but that’s still on going. Cheng Hao From: Masf [mailto:masfwo...@gmail.com] Sent: Thursday, April 2, 2015 11:47 PM To: user@spark.apache.org Subject: Spark SQL. Memory consumption Hi. I'm using Spark SQL 1.2. I have this query: CREATE TABLE test_MA STORED AS PARQUET AS SELECT

[jira] [Updated] (SPARK-6545) Minor changes for CompactBuffer

2015-03-25 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-6545: - Issue Type: Improvement (was: Bug) Minor changes for CompactBuffer

[jira] [Created] (SPARK-6545) Minor changes for CompactBuffer

2015-03-25 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-6545: Summary: Minor changes for CompactBuffer Key: SPARK-6545 URL: https://issues.apache.org/jira/browse/SPARK-6545 Project: Spark Issue Type: Bug Components

[jira] [Commented] (SPARK-6483) Spark SQL udf(ScalaUdf) is very slow

2015-03-24 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377494#comment-14377494 ] Cheng Hao commented on SPARK-6483: -- Can you re-run those 2 queries without GROUP

RE: Spark SQL udf(ScalaUdf) is very slow

2015-03-23 Thread Cheng, Hao
This is a very interesting issue, the root reason for the lower performance probably is, in Scala UDF, Spark SQL converts the data type from internal representation to Scala representation via Scala reflection recursively. Can you create a Jira issue for tracking this? I can start to work on

<    1   2   3   4   5   6   7   >