RE: Spark SQL Self join with agreegate

2015-03-19 Thread Cheng, Hao
Not so sure your intention, but something like SELECT sum(val1), sum(val2) FROM table GROUP BY src, dest ? -Original Message- From: Shailesh Birari [mailto:sbirar...@gmail.com] Sent: Friday, March 20, 2015 9:31 AM To: user@spark.apache.org Subject: Spark SQL Self join with agreegate

RE: Add Char support in SQL dataTypes

2015-03-19 Thread Cheng, Hao
Can you use the Varchar or String instead? Currently, Spark SQL will convert the varchar into string type internally(without max length limitation). However, char type is not supported yet. -Original Message- From: A.M.Chan [mailto:kaka_1...@163.com] Sent: Friday, March 20, 2015 9:56

RE: [SQL] Elasticsearch-hadoop, exception creating temporary table

2015-03-18 Thread Cheng, Hao
Seems the elasticsearch-hadoop project was built with an old version of Spark, and then you upgraded the Spark version in execution env, as I know the StructField changed the definition in Spark 1.2, can you confirm the version problem first? From: Todd Nist [mailto:tsind...@gmail.com] Sent:

RE: Re: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-03-16 Thread Cheng, Hao
Or you need to specify the jars either in configuration or bin/spark-sql --jars mysql-connector-xx.jar From: fightf...@163.com [mailto:fightf...@163.com] Sent: Monday, March 16, 2015 2:04 PM To: sandeep vura; Ted Yu Cc: user Subject: Re: Re: Unable to instantiate

RE: Re: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-03-16 Thread Cheng, Hao
It doesn’t take effect if just putting jar files under the lib-managed/jars folder, you need to put that under class path explicitly. From: sandeep vura [mailto:sandeepv...@gmail.com] Sent: Monday, March 16, 2015 2:21 PM To: Cheng, Hao Cc: fightf...@163.com; Ted Yu; user Subject: Re: Re: Unable

RE: Spark SQL using Hive metastore

2015-03-11 Thread Cheng, Hao
check the configuration file of $SPARK_HOME/conf/spark-xxx.conf ? Cheng Hao From: Grandl Robert [mailto:rgra...@yahoo.com.INVALID] Sent: Thursday, March 12, 2015 5:07 AM To: user@spark.apache.org Subject: Spark SQL using Hive metastore Hi guys, I am a newbie in running Spark SQL / Spark. My goal

RE: Does any one know how to deploy a custom UDAF jar file in SparkSQL?

2015-03-10 Thread Cheng, Hao
You can add the additional jar when submitting your job, something like: ./bin/spark-submit --jars xx.jar … More options can be listed by just typing ./bin/spark-submit From: shahab [mailto:shahab.mok...@gmail.com] Sent: Tuesday, March 10, 2015 8:48 PM To: user@spark.apache.org Subject: Does

RE: Registering custom UDAFs with HiveConetxt in SparkSQL, how?

2015-03-10 Thread Cheng, Hao
Currently, Spark SQL doesn’t provide interface for developing the custom UDTF, but it can work seamless with Hive UDTF. I am working on the UDTF refactoring for Spark SQL, hopefully will provide an Hive independent UDTF soon after that. From: shahab [mailto:shahab.mok...@gmail.com] Sent:

RE: Registering custom UDAFs with HiveConetxt in SparkSQL, how?

2015-03-10 Thread Cheng, Hao
/pull/3247 From: shahab [mailto:shahab.mok...@gmail.com] Sent: Wednesday, March 11, 2015 1:44 AM To: Cheng, Hao Cc: user@spark.apache.org Subject: Re: Registering custom UDAFs with HiveConetxt in SparkSQL, how? Thanks Hao, But my question concerns UDAF (user defined aggregation function ) not UDTF

RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?

2015-03-10 Thread Cheng, Hao
I am not so sure if Hive supports change the metastore after initialized, I guess not. Spark SQL totally rely on Hive Metastore in HiveContext, probably that's why it doesn't work as expected for Q1. BTW, in most of cases, people configure the metastore settings in hive-site.xml, and will not

RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?

2015-03-10 Thread Cheng, Hao
I am not so sure if Hive supports change the metastore after initialized, I guess not. Spark SQL totally rely on Hive Metastore in HiveContext, probably that's why it doesn't work as expected for Q1. BTW, in most of cases, people configure the metastore settings in hive-site.xml, and will not

RE: SQL with Spark Streaming

2015-03-10 Thread Cheng, Hao
Intel has a prototype for doing this, SaiSai and Jason are the authors. Probably you can ask them for some materials. From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Wednesday, March 11, 2015 8:12 AM To: user@spark.apache.org Subject: SQL with Spark Streaming Does Spark Streaming also

[jira] [Updated] (SPARK-5817) UDTF column names didn't set properly

2015-03-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5817: - Description: {code} createQueryTest(Specify the udtf output, select d from (select explode(array(1,1)) d

[jira] [Updated] (SPARK-5817) UDTF column names didn't set properly

2015-03-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5817: - Description: {code} createQueryTest(Specify the udtf output, select d from (select explode(array(key,1

[jira] [Updated] (SPARK-5817) UDTF column names didn't set properly

2015-03-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5817: - Description: createQueryTest(Specify the udtf output, select d from (select explode(array(key,1)) d

[jira] [Updated] (SPARK-5817) UDTF column names didn't set properly

2015-03-09 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5817: - Description: {code} createQueryTest(Specify the udtf output, select d from (select explode(array(1,1)) d

RE: Connection PHP application to Spark Sql thrift server

2015-03-05 Thread Cheng, Hao
Can you query upon Hive? Let's confirm if it's a bug of SparkSQL in your PHP code first. -Original Message- From: fanooos [mailto:dev.fano...@gmail.com] Sent: Thursday, March 5, 2015 4:57 PM To: user@spark.apache.org Subject: Connection PHP application to Spark Sql thrift server We

RE: Does SparkSQL support ..... having count (fieldname) in SQL statement?

2015-03-04 Thread Cheng, Hao
I’ve tried with latest code, seems it works, which version are you using Shahab? From: yana [mailto:yana.kadiy...@gmail.com] Sent: Wednesday, March 4, 2015 8:47 PM To: shahab; user@spark.apache.org Subject: RE: Does SparkSQL support . having count (fieldname) in SQL statement? I think the

[jira] [Comment Edited] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-04 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348293#comment-14348293 ] Cheng Hao edited comment on SPARK-5791 at 3/5/15 7:08 AM: -- I

[jira] [Comment Edited] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-04 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348293#comment-14348293 ] Cheng Hao edited comment on SPARK-5791 at 3/5/15 7:07 AM: -- I

[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-04 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348293#comment-14348293 ] Cheng Hao commented on SPARK-5791: -- I think this is a typical case that we need

RE: Supporting Hive features in Spark SQL Thrift JDBC server

2015-03-03 Thread Cheng, Hao
Can you provide the detailed failure call stack? From: shahab [mailto:shahab.mok...@gmail.com] Sent: Tuesday, March 3, 2015 3:52 PM To: user@spark.apache.org Subject: Supporting Hive features in Spark SQL Thrift JDBC server Hi, According to Spark SQL documentation, Spark SQL supports the

RE: SparkSQL, executing an OR

2015-03-03 Thread Cheng, Hao
Using where('age =10 'age =4) instead. -Original Message- From: Guillermo Ortiz [mailto:konstt2...@gmail.com] Sent: Tuesday, March 3, 2015 5:14 PM To: user Subject: SparkSQL, executing an OR I'm trying to execute a query with Spark. (Example from the Spark Documentation) val teenagers

RE: Supporting Hive features in Spark SQL Thrift JDBC server

2015-03-03 Thread Cheng, Hao
Hive UDF are only applicable for HiveContext and its subclass instance, is the CassandraAwareSQLContext a direct sub class of HiveContext or SQLContext? From: shahab [mailto:shahab.mok...@gmail.com] Sent: Tuesday, March 3, 2015 5:10 PM To: Cheng, Hao Cc: user@spark.apache.org Subject: Re

RE: insert Hive table with RDD

2015-03-03 Thread Cheng, Hao
Using the SchemaRDD / DataFrame API via HiveContext Assume you're using the latest code, something probably like: val hc = new HiveContext(sc) import hc.implicits._ existedRdd.toDF().insertInto(hivetable) or existedRdd.toDF().registerTempTable(mydata) hc.sql(insert into hivetable as select xxx

RE: java.lang.IncompatibleClassChangeError when using PrunedFilteredScan

2015-03-03 Thread Cheng, Hao
As the call stack shows, the mongodb connector is not compatible with the Spark SQL Data Source interface. The latest Data Source API is changed since 1.2, probably you need to confirm which spark version the MongoDB Connector build against. By the way, a well format call stack will be more

RE: Spark SQL Thrift Server start exception : java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2015-03-03 Thread Cheng, Hao
” while starting the spark shell. From: Anusha Shamanur [mailto:anushas...@gmail.com] Sent: Wednesday, March 4, 2015 5:07 AM To: Cheng, Hao Subject: Re: Spark SQL Thrift Server start exception : java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory Hi, I am getting

RE: Is SQLContext thread-safe?

2015-03-02 Thread Cheng, Hao
instance. -Original Message- From: Haopu Wang [mailto:hw...@qilinsoft.com] Sent: Tuesday, March 3, 2015 7:56 AM To: Cheng, Hao; user Subject: RE: Is SQLContext thread-safe? Thanks for the response. Then I have another question: when will we want to create multiple SQLContext instances

RE: Spark SQL Thrift Server start exception : java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2015-03-02 Thread Cheng, Hao
Copy those jars into the $SPARK_HOME/lib/ datanucleus-api-jdo-3.2.6.jar datanucleus-core-3.2.10.jar datanucleus-rdbms-3.2.9.jar see https://github.com/apache/spark/blob/master/bin/compute-classpath.sh#L120 -Original Message- From: fanooos [mailto:dev.fano...@gmail.com] Sent: Tuesday,

RE: Executing hive query from Spark code

2015-03-02 Thread Cheng, Hao
I am not so sure how Spark SQL compiled in CDH, but if didn’t specify the –Phive and –Phive-thriftserver flags during the build, most likely it will not work if just by providing the Hive lib jars later on. For example, does the HiveContext class exist in the assembly jar? I am also quite

RE: Is SQLContext thread-safe?

2015-03-02 Thread Cheng, Hao
https://issues.apache.org/jira/browse/SPARK-2087 https://github.com/apache/spark/pull/4382 I am working on the prototype, but will be updated soon. -Original Message- From: Haopu Wang [mailto:hw...@qilinsoft.com] Sent: Tuesday, March 3, 2015 8:32 AM To: Cheng, Hao; user Subject: RE

RE: Performance tuning in Spark SQL.

2015-03-02 Thread Cheng, Hao
This is actually a quite open question, from my understanding, there're probably ways to tune like: *SQL Configurations like: Configuration Key Default Value spark.sql.autoBroadcastJoinThreshold 10 * 1024 * 1024 spark.sql.defaultSizeInBytes 10 * 1024 * 1024 + 1

RE: Is SQLContext thread-safe?

2015-03-02 Thread Cheng, Hao
Yes it is thread safe, at least it's supposed to be. -Original Message- From: Haopu Wang [mailto:hw...@qilinsoft.com] Sent: Monday, March 2, 2015 4:43 PM To: user Subject: Is SQLContext thread-safe? Hi, is it safe to use the same SQLContext to do Select operations in different threads

JLine hangs under Windows8

2015-02-27 Thread Cheng, Hao
$.main(SparkSQLCLIDriver.scala:202) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) Thanks, Cheng Hao - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

RE: JLine hangs under Windows8

2015-02-27 Thread Cheng, Hao
It works after adding the -Djline.terminal=jline.UnsupportedTerminal -Original Message- From: Cheng, Hao [mailto:hao.ch...@intel.com] Sent: Saturday, February 28, 2015 10:24 AM To: user@spark.apache.org Subject: JLine hangs under Windows8 Hi, All I was trying to run spark sql cli

RE: Spark-SQL 1.2.0 sort by results are not consistent with Hive

2015-02-25 Thread Cheng, Hao
How many reducers you set for Hive? With small data set, Hive will run in local mode, which will set the reducer count always as 1. From: Kannan Rajah [mailto:kra...@maprtech.com] Sent: Thursday, February 26, 2015 3:02 AM To: Cheng Lian Cc: user@spark.apache.org Subject: Re: Spark-SQL 1.2.0 sort

[jira] [Created] (SPARK-6034) DESCRIBE EXTENDED viewname is not supported for HiveContext

2015-02-25 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-6034: Summary: DESCRIBE EXTENDED viewname is not supported for HiveContext Key: SPARK-6034 URL: https://issues.apache.org/jira/browse/SPARK-6034 Project: Spark Issue

[jira] [Updated] (SPARK-5941) `def table` is not using the unresolved logical plan `UnresolvedRelation`

2015-02-21 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5941: - Summary: `def table` is not using the unresolved logical plan `UnresolvedRelation` (was: `def table

[jira] [Created] (SPARK-5941) `def table` is not using the unresolved logical plan

2015-02-21 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5941: Summary: `def table` is not using the unresolved logical plan Key: SPARK-5941 URL: https://issues.apache.org/jira/browse/SPARK-5941 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-5941) `def table` is not using the unresolved logical plan in DataFrameImpl

2015-02-21 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332003#comment-14332003 ] Cheng Hao commented on SPARK-5941: -- Eagerly resolving the table probably causes side

[jira] [Updated] (SPARK-5941) `def table` is not using the unresolved logical plan in DataFrameImpl

2015-02-21 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5941: - Summary: `def table` is not using the unresolved logical plan in DataFrameImpl (was: `def table

RE: Extract hour from Timestamp in Spark SQL

2015-02-15 Thread Cheng, Hao
Are you using the SQLContext? I think the HiveContext is recommended. Cheng Hao From: Wush Wu [mailto:w...@bridgewell.com] Sent: Thursday, February 12, 2015 2:24 PM To: u...@spark.incubator.apache.org Subject: Extract hour from Timestamp in Spark SQL Dear all, I am new to Spark SQL and have

[jira] [Updated] (SPARK-5817) UDTF column names didn't set properly

2015-02-13 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5817: - Description: createQueryTest(insert table with generator with column name, CREATE TABLE

[jira] [Created] (SPARK-5817) UDTF column names didn't set properly

2015-02-13 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5817: Summary: UDTF column names didn't set properly Key: SPARK-5817 URL: https://issues.apache.org/jira/browse/SPARK-5817 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-02-12 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319400#comment-14319400 ] Cheng Hao commented on SPARK-5791: -- Can you also attach the performance comparison result

[jira] [Created] (SPARK-5706) Support inference schema from a single json string

2015-02-09 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5706: Summary: Support inference schema from a single json string Key: SPARK-5706 URL: https://issues.apache.org/jira/browse/SPARK-5706 Project: Spark Issue Type

[jira] [Created] (SPARK-5709) Add EXPLAIN support for DataFrame API for debugging purpose

2015-02-09 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5709: Summary: Add EXPLAIN support for DataFrame API for debugging purpose Key: SPARK-5709 URL: https://issues.apache.org/jira/browse/SPARK-5709 Project: Spark Issue

[jira] [Created] (SPARK-5683) Improve the json serialization for DataFrame API

2015-02-09 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5683: Summary: Improve the json serialization for DataFrame API Key: SPARK-5683 URL: https://issues.apache.org/jira/browse/SPARK-5683 Project: Spark Issue Type

[jira] [Created] (SPARK-5550) Custom UDF is case sensitive for HiveContext

2015-02-02 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5550: Summary: Custom UDF is case sensitive for HiveContext Key: SPARK-5550 URL: https://issues.apache.org/jira/browse/SPARK-5550 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-5550) Custom UDF is case sensitive for HiveContext

2015-02-02 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5550: - Description: SQL in HiveContext, should be case insensitive, however, the following query will fail

RE: [SQL] Self join with ArrayType columns problems

2015-01-27 Thread Cheng, Hao
The root cause for this probably because the identical “exprId” of the “AttributeReference” existed while do self-join with “temp table” (temp table = resolved logical plan). I will do the bug fixing and JIRA creation. Cheng Hao From: Michael Armbrust [mailto:mich...@databricks.com] Sent

[jira] [Created] (SPARK-5404) Statistic of Logical Plan is too aggresive

2015-01-25 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5404: Summary: Statistic of Logical Plan is too aggresive Key: SPARK-5404 URL: https://issues.apache.org/jira/browse/SPARK-5404 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-5213) Pluggable SQL Parser Support

2015-01-22 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5213: - Summary: Pluggable SQL Parser Support (was: Support the SQL Parser Registry) Pluggable SQL Parser

[jira] [Updated] (SPARK-5213) Pluggable SQL Parser Support

2015-01-22 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5213: - Description: Currently, the SQL Parser dialect is hard code in SQLContext, which is not easy to extend

[jira] [Created] (SPARK-5364) HiveQL transform doesn't support the non output clause

2015-01-22 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5364: Summary: HiveQL transform doesn't support the non output clause Key: SPARK-5364 URL: https://issues.apache.org/jira/browse/SPARK-5364 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-5213) Pluggable SQL Parser Support

2015-01-22 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5213: - Description: Currently, the SQL Parser dialect is hard code in SQLContext, which is not easy to extend

RE: SparkSQL 1.2.0 sources API error

2015-01-18 Thread Cheng, Hao
It seems the netty jar works with an incompatible method signature. Can you check if there different versions of netty jar in your classpath? From: Walrus theCat [mailto:walrusthe...@gmail.com] Sent: Sunday, January 18, 2015 3:37 PM To: user@spark.apache.org Subject: Re: SparkSQL 1.2.0 sources

RE: using hiveContext to select a nested Map-data-type from an AVROmodel+parquet file

2015-01-17 Thread Cheng, Hao
Wow, glad to know that it works well, and sorry, the Jira is another issue, which is not the same case here. From: Bagmeet Behera [mailto:bagme...@gmail.com] Sent: Saturday, January 17, 2015 12:47 AM To: Cheng, Hao Subject: Re: using hiveContext to select a nested Map-data-type from

RE: using hiveContext to select a nested Map-data-type from an AVROmodel+parquet file

2015-01-15 Thread Cheng, Hao
Hi, BB Ideally you can do the query like: select key, value.percent from mytable_data lateral view explode(audiences) f as key, value limit 3; But there is a bug in HiveContext: https://issues.apache.org/jira/browse/SPARK-5237 I am working on it now, hopefully make a patch soon. Cheng

RE: Join implementation in SparkSQL

2015-01-15 Thread Cheng, Hao
Not so sure about your question, but the SparkStrategies.scala and Optimizer.scala is a good start if you want to get details of the join implementation or optimization. -Original Message- From: Andrew Ash [mailto:and...@andrewash.com] Sent: Friday, January 16, 2015 4:52 AM To: Reynold

RE: Spark SQL Custom Predicate Pushdown

2015-01-15 Thread Cheng, Hao
The Data Source API probably work for this purpose. It support the column pruning and the Predicate Push Down: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala Examples also can be found in the unit test:

RE: Issues with constants in Spark HiveQL queries

2015-01-14 Thread Cheng, Hao
The log showed it failed in parsing, so the typo stuff shouldn’t be the root cause. BUT I couldn’t reproduce that with master branch. I did the test as follow: sbt/sbt –Phadoop-2.3.0 –Phadoop-2.3 –Phive –Phive-0.13.1 hive/console scala sql(“SELECT user_id FROM actions where

[jira] [Created] (SPARK-5213) Support the SQL Parser Registry

2015-01-13 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5213: Summary: Support the SQL Parser Registry Key: SPARK-5213 URL: https://issues.apache.org/jira/browse/SPARK-5213 Project: Spark Issue Type: New Feature

[jira] [Created] (SPARK-5202) HiveContext doesn't support the Variables Substitution

2015-01-11 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5202: Summary: HiveContext doesn't support the Variables Substitution Key: SPARK-5202 URL: https://issues.apache.org/jira/browse/SPARK-5202 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-5202) HiveContext doesn't support the Variables Substitution

2015-01-11 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5202: - Description: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution

[jira] [Resolved] (SPARK-4636) Cluster By Distribute By output different with Hive

2015-01-08 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao resolved SPARK-4636. -- Resolution: Not a Problem The answer with highest score seems not correct, in might tested

[jira] [Comment Edited] (SPARK-4636) Cluster By Distribute By output different with Hive

2015-01-08 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270466#comment-14270466 ] Cheng Hao edited comment on SPARK-4636 at 1/9/15 2:57 AM

[jira] [Comment Edited] (SPARK-4636) Cluster By Distribute By output different with Hive

2015-01-08 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270466#comment-14270466 ] Cheng Hao edited comment on SPARK-4636 at 1/9/15 2:56 AM

[jira] [Commented] (SPARK-5117) Hive Generic UDFs don't cast correctly

2015-01-07 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268861#comment-14268861 ] Cheng Hao commented on SPARK-5117: -- Definitely we can do that then. Hive Generic UDFs

[jira] [Commented] (SPARK-4366) Aggregation Optimization

2015-01-06 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266124#comment-14266124 ] Cheng Hao commented on SPARK-4366: -- [~marmbrus] I've uploaded an draft design doc

[jira] [Updated] (SPARK-4366) Aggregation Optimization

2015-01-06 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4366: - Attachment: aggregatefunction_v1.pdf Draft Design Doc. Aggregation Optimization

[jira] [Resolved] (SPARK-5117) Hive Generic UDFs don't cast correctly

2015-01-06 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao resolved SPARK-5117. -- Resolution: Won't Fix This IS NOT a bug of Spark SQL. Hive changed the LPAD implementation since Hive

RE: Implement customized Join for SparkSQL

2015-01-05 Thread Cheng, Hao
Can you paste the error log? From: Dai, Kevin [mailto:yun...@ebay.com] Sent: Monday, January 5, 2015 6:29 PM To: user@spark.apache.org Subject: Implement customized Join for SparkSQL Hi, All Suppose I want to join two tables A and B as follows: Select * from A join B on A.id = B.id A is a

[jira] [Created] (SPARK-4967) File name with comma will cause exception for SQLContext.parquetFile

2014-12-25 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4967: Summary: File name with comma will cause exception for SQLContext.parquetFile Key: SPARK-4967 URL: https://issues.apache.org/jira/browse/SPARK-4967 Project: Spark

RE: Escape commas in file names

2014-12-25 Thread Cheng, Hao
multiple parquet files for API sqlContext.parquetFile, we need to think how to support multiple paths in some other way. Cheng Hao From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Thursday, December 25, 2014 1:01 PM To: Daniel Siegmann Cc: user@spark.apache.org Subject: Re: Escape

RE: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Cheng, Hao
I am wondering if we can provide more friendly API, other than configuration for this purpose. What do you think Patrick? Cheng Hao -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 25, 2014 3:22 PM To: Shao, Saisai Cc: u...@spark.apache.org

RE: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Cheng, Hao
I am wondering if we can provide more friendly API, other than configuration for this purpose. What do you think Patrick? Cheng Hao -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 25, 2014 3:22 PM To: Shao, Saisai Cc: user@spark.apache.org

[jira] [Created] (SPARK-4944) Table Not Found exception in Create Table Like registered RDD table

2014-12-23 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4944: Summary: Table Not Found exception in Create Table Like registered RDD table Key: SPARK-4944 URL: https://issues.apache.org/jira/browse/SPARK-4944 Project: Spark

[jira] [Created] (SPARK-4945) Add overwrite option support for SchemaRDD.saveAsParquetFile

2014-12-23 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4945: Summary: Add overwrite option support for SchemaRDD.saveAsParquetFile Key: SPARK-4945 URL: https://issues.apache.org/jira/browse/SPARK-4945 Project: Spark Issue

RE: SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD

2014-12-23 Thread Cheng, Hao
Hi, Lam, I can confirm this is a bug with the latest master, and I filed a jira issue for this: https://issues.apache.org/jira/browse/SPARK-4944 Hope come with a solution soon. Cheng Hao From: Jerry Lam [mailto:chiling...@gmail.com] Sent: Wednesday, December 24, 2014 4:26 AM To: user

[jira] [Updated] (SPARK-4367) 2 Phase-shuffle to optimize the DISTINCT aggregation

2014-12-21 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4367: - Summary: 2 Phase-shuffle to optimize the DISTINCT aggregation (was: Process the distinct value before

[jira] [Updated] (SPARK-4367) Partial aggregation support the DISTINCT aggregation

2014-12-21 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4367: - Summary: Partial aggregation support the DISTINCT aggregation (was: 2 Phase-shuffle to optimize

[jira] [Created] (SPARK-4904) Remove the foldable checking in HiveGenericUdf.eval

2014-12-19 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4904: Summary: Remove the foldable checking in HiveGenericUdf.eval Key: SPARK-4904 URL: https://issues.apache.org/jira/browse/SPARK-4904 Project: Spark Issue Type

[jira] [Commented] (SPARK-4367) Process the distinct value before shuffling for aggregation

2014-12-19 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254454#comment-14254454 ] Cheng Hao commented on SPARK-4367: -- I am working on updating the Aggregation Function

[jira] [Updated] (HIVE-9004) Reset doesn't work for the default empty value entry

2014-12-17 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated HIVE-9004: Attachment: (was: reset.patch) Reset doesn't work for the default empty value entry

[jira] [Updated] (HIVE-9004) Reset doesn't work for the default empty value entry

2014-12-17 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated HIVE-9004: Attachment: HIVE-9004.patch Reset doesn't work for the default empty value entry

[jira] [Commented] (HIVE-9004) Reset doesn't work for the default empty value entry

2014-12-17 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251138#comment-14251138 ] Cheng Hao commented on HIVE-9004: - Thank you [~szehon], updated. Reset doesn't work

[jira] [Updated] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-16 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {quote} TestSQLContext.sparkContext.parallelize( {ip:27.31.100.29

[jira] [Updated] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-16 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {panel} TestSQLContext.sparkContext.parallelize( {ip:27.31.100.29

[jira] [Updated] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-16 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {noformat} TestSQLContext.sparkContext.parallelize( {ip:27.31.100.29

[jira] [Updated] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-16 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: dWe have data like: {noformat} TestSQLContext.sparkContext.parallelize( {ip:27.31.100.29

[jira] [Updated] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-16 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {noformat} TestSQLContext.sparkContext.parallelize( {ip:27.31.100.29

[jira] [Updated] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-16 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {noformat} TestSQLContext.sparkContext.parallelize( {ip:27.31.100.29

[jira] [Created] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-15 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4856: Summary: Null empty string should not be considered as StringType at begining in Json schema inferring Key: SPARK-4856 URL: https://issues.apache.org/jira/browse/SPARK-4856

[jira] [Updated] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {panel} TestSQLContext.sparkContext.parallelize( {ip:27.31.100.29

[jira] [Updated] (SPARK-4856) Null empty string should not be considered as StringType at begining in Json schema inferring

2014-12-15 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-4856: - Description: We have data like: {code:java} TestSQLContext.sparkContext.parallelize( {ip:27.31.100.29

RE: Why my SQL UDF cannot be registered?

2014-12-15 Thread Cheng, Hao
As the error log shows, you may need to register it as: sqlContext.rgisterFunction(“toHour”, toHour _) The “_” means you are passing the function as parameter, not invoking it. Cheng Hao From: Xuelin Cao [mailto:xuelin...@yahoo.com.INVALID] Sent: Monday, December 15, 2014 5:28 PM To: User

RE: Where are the docs for the SparkSQL DataTypes?

2014-12-11 Thread Cheng, Hao
Part of it can be found at: https://github.com/apache/spark/pull/3429/files#diff-f88c3e731fcb17b1323b778807c35b38R34 Sorry it's a TO BE reviewed PR, but still should be informative. Cheng Hao -Original Message- From: Alessandro Baretta [mailto:alexbare...@gmail.com] Sent: Friday

RE: Can HiveContext be used without using Hive?

2014-12-09 Thread Cheng, Hao
It works exactly like Create Table As Select (CTAS) in Hive. Cheng Hao From: Anas Mosaad [mailto:anas.mos...@incorta.com] Sent: Wednesday, December 10, 2014 11:59 AM To: Michael Armbrust Cc: Manoj Samel; user@spark.apache.org Subject: Re: Can HiveContext be used without using Hive

RE: CREATE TABLE AS SELECT does not work with temp tables in 1.2.0

2014-12-06 Thread Cheng, Hao
I've created(reused) the PR https://github.com/apache/spark/pull/3336, hopefully we can fix this regression. Thanks for the reporting. Cheng Hao -Original Message- From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Saturday, December 6, 2014 4:51 AM To: kb Cc: d

<    1   2   3   4   5   6   7   >