from:"Cheng Lian"

[jira] [Created] (SPARK-14953) LocalBackend should revive offers periodically

2016-04-27 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14953:
--

 Summary: LocalBackend should revive offers periodically
 Key: SPARK-14953
 URL: https://issues.apache.org/jira/browse/SPARK-14953
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Cheng Lian


{{LocalBackend}} only revives offers when tasks are submitted, succeed, or 
fail. This may lead to deadlock due to delayed scheduling. A case study is 
provided in [this PR 
comment|https://github.com/apache/spark/pull/12527#issuecomment-213034425].

Basically, a job may have a task is delayed to be scheduled due to locality 
mismatch. The default delay timeout is 3s. If all other tasks finish during 
this period, {{LocalBackend}} won't revive any offer after the timeout since no 
tasks are submitted, succeed or fail then. Thus, the delayed task will never be 
scheduled again and the job never completes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14918) ExternalCatalog.TablePartitionSpec doesn't preserve partition column order

2016-04-27 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14918.

Resolution: Not A Problem

> ExternalCatalog.TablePartitionSpec doesn't preserve partition column order
> --
>
> Key: SPARK-14918
> URL: https://issues.apache.org/jira/browse/SPARK-14918
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Equivalent entity of {{ExternalCatalog.TablePartitionSpec}} in Hive is a 
> {{LinkedHashMap}} returned by {{Partition.getSpec()}}, which preserves 
> partition column order.
> However, we are using a {{scala.immutable.Map}} to store the result, which no 
> longer preserves the original order. What makes it worse, Scala specializes 
> immutable maps with less than 5 elements. And these specialized versions do 
> preserve order, thus hides this issue in test cases since we never use more 
> than 4 partition columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14918) ExternalCatalog.TablePartitionSpec doesn't preserve partition column order

2016-04-27 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259906#comment-15259906
 ] 

Cheng Lian commented on SPARK-14918:


Decided to leave this one as "not an issue" since the problem I hit in PR 
#1 can be fixed inside the PR itself. The fact that Scala immutable map 
doesn't preserve order doesn't bring any negative effect anywhere else.

> ExternalCatalog.TablePartitionSpec doesn't preserve partition column order
> --
>
> Key: SPARK-14918
> URL: https://issues.apache.org/jira/browse/SPARK-14918
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Equivalent entity of {{ExternalCatalog.TablePartitionSpec}} in Hive is a 
> {{LinkedHashMap}} returned by {{Partition.getSpec()}}, which preserves 
> partition column order.
> However, we are using a {{scala.immutable.Map}} to store the result, which no 
> longer preserves the original order. What makes it worse, Scala specializes 
> immutable maps with less than 5 elements. And these specialized versions do 
> preserve order, thus hides this issue in test cases since we never use more 
> than 4 partition columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14445) Show columns/partitions

2016-04-26 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14445.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 1
[https://github.com/apache/spark/pull/1]

> Show columns/partitions
> ---
>
> Key: SPARK-14445
> URL: https://issues.apache.org/jira/browse/SPARK-14445
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
> Fix For: 2.0.0
>
>
> 1. Support native execution of SHOW COLUMNS
> 2. Support native execution of SHOW PARTITIONS
> The syntax of SHOW commands are described in following link.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Show



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13983) HiveThriftServer2 can not get "--hiveconf" or ''--hivevar" variables since 1.6 version (both multi-session and single session)

2016-04-26 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258139#comment-15258139
 ] 

Cheng Lian edited comment on SPARK-13983 at 4/26/16 4:44 PM:
-

Here's my (incomplete) finding:

Configurations set using {{-hiveconf}} and {{-hivevar}} are set to the current 
{{SessionState}} after [calling SessionManager.openSession 
here|https://github.com/apache/spark/blob/branch-1.6/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L68-L70].

In 1.5, these configurations are populated implicitly since {{SessionState}} is 
thread-local.

In 1.6, we create a new {{HiveContext}} using {{HiveContext.newSession}} under 
multi-session mode, which then [creates a new execution Hive 
client|https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L119].
 My theory is that, {{ClientWrapper.newSession}} ignores the current 
{{SessionState}} and simply creates a new one, thus configurations set via CLI 
flags are dropped.

I haven't completely verified the last point though.



was (Author: lian cheng):
Here's my (incomplete) finding:

Configurations set using {{--hiveconf}} and {{--hivevar}} are set to the 
current {{SessionState}} after [calling SessionManager.openSession 
here|https://github.com/apache/spark/blob/branch-1.6/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L68-L70].

In 1.5, these configurations are populated implicitly since {{SessionState}} is 
thread-local.

In 1.6, we create a new {{HiveContext}} using {{HiveContext.newSession}} under 
multi-session mode, which then [creates a new execution Hive 
client|https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L119].
 My theory is that, {{ClientWrapper.newSession}} ignores the current 
{{SessionState}} and simply creates a new one, thus configurations set via CLI 
flags are dropped.

I haven't completely verified the last point though.


> HiveThriftServer2 can not get "--hiveconf" or ''--hivevar" variables since 
> 1.6 version (both multi-session and single session)
> --
>
> Key: SPARK-13983
> URL: https://issues.apache.org/jira/browse/SPARK-13983
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: ubuntu, spark 1.6.0 standalone, spark 1.6.1 standalone
> (tried spark branch-1.6 snapshot as well)
> compiled with scala 2.10.5 and hadoop 2.6
> (-Phadoop-2.6 -Psparkr -Phive -Phive-thriftserver)
>Reporter: Teng Qiu
>Assignee: Cheng Lian
>
> HiveThriftServer2 should be able to get "\--hiveconf" or ''\-\-hivevar" 
> variables from JDBC client, either from command line parameter of beeline, 
> such as
> {{beeline --hiveconf spark.sql.shuffle.partitions=3 --hivevar 
> db_name=default}}
> or from JDBC connection string, like
> {{jdbc:hive2://localhost:1?spark.sql.shuffle.partitions=3#db_name=default}}
> this worked in spark version 1.5.x, but after upgraded to 1.6, it doesn't 
> work.
> to reproduce this issue, try to connect to HiveThriftServer2 with beeline:
> {code}
> bin/beeline -u jdbc:hive2://localhost:1 \
> --hiveconf spark.sql.shuffle.partitions=3 \
> --hivevar db_name=default
> {code}
> or
> {code}
> bin/beeline -u 
> jdbc:hive2://localhost:1?spark.sql.shuffle.partitions=3#db_name=default
> {code}
> will get following results:
> {code}
> 0: jdbc:hive2://localhost:1> set spark.sql.shuffle.partitions;
> +---++--+
> |  key  | value  |
> +---++--+
> | spark.sql.shuffle.partitions  | 200|
> +---++--+
> 1 row selected (0.192 seconds)
> 0: jdbc:hive2://localhost:1> use ${db_name};
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> '$' '{' 'db_name' in switch database statement; line 1 pos 4 (state=,code=0)
> {code}
> -
> but this bug does not affect current versions of spark-sql CLI, following 
> commands works:
> {code}
> bin/spark-sql --master local[2] \
>   --hiveconf spark.sql.shuffle.partitions=3 \
>   --hivevar db_name=default
> spark-sql> set spark.sql.shuffle.partitions
> spark.sql.shuffle.partitions   3
> Time taken: 1.037 seconds, Fetched 1 row(s)
> spark-sql> use ${db_name};

[jira] [Commented] (SPARK-14918) ExternalCatalog.TablePartitionSpec doesn't preserve partition column order

2016-04-26 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258191#comment-15258191
 ] 

Cheng Lian commented on SPARK-14918:


Hit this issue while reviewing https://github.com/apache/spark/pull/1

However, checked all usages of {{TablePartitionSpec}} throughout current master 
branch, none of them assumes that {{TablePartitionSpec}} preserves the order. I 
tend to fix the problematic PR by stop relying on the wrong assumption.

> ExternalCatalog.TablePartitionSpec doesn't preserve partition column order
> --
>
> Key: SPARK-14918
> URL: https://issues.apache.org/jira/browse/SPARK-14918
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Equivalent entity of {{ExternalCatalog.TablePartitionSpec}} in Hive is a 
> {{LinkedHashMap}} returned by {{Partition.getSpec()}}, which preserves 
> partition column order.
> However, we are using a {{scala.immutable.Map}} to store the result, which no 
> longer preserves the original order. What makes it worse, Scala specializes 
> immutable maps with less than 5 elements. And these specialized versions do 
> preserve order, thus hides this issue in test cases since we never use more 
> than 4 partition columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14918) ExternalCatalog.TablePartitionSpec doesn't preserve partition column order

2016-04-26 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14918:
--

 Summary: ExternalCatalog.TablePartitionSpec doesn't preserve 
partition column order
 Key: SPARK-14918
 URL: https://issues.apache.org/jira/browse/SPARK-14918
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Equivalent entity of {{ExternalCatalog.TablePartitionSpec}} in Hive is a 
{{LinkedHashMap}} returned by {{Partition.getSpec()}}, which preserves 
partition column order.

However, we are using a {{scala.immutable.Map}} to store the result, which no 
longer preserves the original order. What makes it worse, Scala specializes 
immutable maps with less than 5 elements. And these specialized versions do 
preserve order, thus hides this issue in test cases since we never use more 
than 4 partition columns.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13983) HiveThriftServer2 can not get "--hiveconf" or ''--hivevar" variables since 1.6 version (both multi-session and single session)

2016-04-26 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258139#comment-15258139
 ] 

Cheng Lian commented on SPARK-13983:


Here's my (incomplete) finding:

Configurations set using {{--hiveconf}} and {{--hivevar}} are set to the 
current {{SessionState}} after [calling SessionManager.openSession 
here|https://github.com/apache/spark/blob/branch-1.6/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L68-L70].

In 1.5, these configurations are populated implicitly since {{SessionState}} is 
thread-local.

In 1.6, we create a new {{HiveContext}} using {{HiveContext.newSession}} under 
multi-session mode, which then [creates a new execution Hive 
client|https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L119].
 My theory is that, {{ClientWrapper.newSession}} ignores the current 
{{SessionState}} and simply creates a new one, thus configurations set via CLI 
flags are dropped.

I haven't completely verified the last point though.


> HiveThriftServer2 can not get "--hiveconf" or ''--hivevar" variables since 
> 1.6 version (both multi-session and single session)
> --
>
> Key: SPARK-13983
> URL: https://issues.apache.org/jira/browse/SPARK-13983
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: ubuntu, spark 1.6.0 standalone, spark 1.6.1 standalone
> (tried spark branch-1.6 snapshot as well)
> compiled with scala 2.10.5 and hadoop 2.6
> (-Phadoop-2.6 -Psparkr -Phive -Phive-thriftserver)
>Reporter: Teng Qiu
>Assignee: Cheng Lian
>
> HiveThriftServer2 should be able to get "\--hiveconf" or ''\-\-hivevar" 
> variables from JDBC client, either from command line parameter of beeline, 
> such as
> {{beeline --hiveconf spark.sql.shuffle.partitions=3 --hivevar 
> db_name=default}}
> or from JDBC connection string, like
> {{jdbc:hive2://localhost:1?spark.sql.shuffle.partitions=3#db_name=default}}
> this worked in spark version 1.5.x, but after upgraded to 1.6, it doesn't 
> work.
> to reproduce this issue, try to connect to HiveThriftServer2 with beeline:
> {code}
> bin/beeline -u jdbc:hive2://localhost:1 \
> --hiveconf spark.sql.shuffle.partitions=3 \
> --hivevar db_name=default
> {code}
> or
> {code}
> bin/beeline -u 
> jdbc:hive2://localhost:1?spark.sql.shuffle.partitions=3#db_name=default
> {code}
> will get following results:
> {code}
> 0: jdbc:hive2://localhost:1> set spark.sql.shuffle.partitions;
> +---++--+
> |  key  | value  |
> +---++--+
> | spark.sql.shuffle.partitions  | 200|
> +---++--+
> 1 row selected (0.192 seconds)
> 0: jdbc:hive2://localhost:1> use ${db_name};
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> '$' '{' 'db_name' in switch database statement; line 1 pos 4 (state=,code=0)
> {code}
> -
> but this bug does not affect current versions of spark-sql CLI, following 
> commands works:
> {code}
> bin/spark-sql --master local[2] \
>   --hiveconf spark.sql.shuffle.partitions=3 \
>   --hivevar db_name=default
> spark-sql> set spark.sql.shuffle.partitions
> spark.sql.shuffle.partitions   3
> Time taken: 1.037 seconds, Fetched 1 row(s)
> spark-sql> use ${db_name};
> OK
> Time taken: 1.697 seconds
> {code}
> so I think it may caused by this change: 
> https://github.com/apache/spark/pull/8909 ( [SPARK-10810] [SPARK-10902] [SQL] 
> Improve session management in SQL )
> perhaps by calling {{hiveContext.newSession}}, the variables from 
> {{sessionConf}} were not loaded into the new session? 
> (https://github.com/apache/spark/pull/8909/files#diff-8f8b7f4172e8a07ff20a4dbbbcc57b1dR69)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14875) OutputWriterFactory.newInstance shouldn't be private[sql]

2016-04-25 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14875.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12652
[https://github.com/apache/spark/pull/12652]

> OutputWriterFactory.newInstance shouldn't be private[sql]
> -
>
> Key: SPARK-14875
> URL: https://issues.apache.org/jira/browse/SPARK-14875
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>    Assignee: Cheng Lian
> Fix For: 2.0.0
>
>
> Existing packages like spark-avro need to access 
> {{OutputFactoryWriter.newInstance}}, but it's marked as {{private\[sql\]}} in 
> Spark 2.0. Should make it public again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14875) OutputWriterFactory.newInstance shouldn't be private[sql]

2016-04-23 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255249#comment-15255249
 ] 

Cheng Lian commented on SPARK-14875:


Checked with [~cloud_fan], it was accidentally made private while adding 
bucketing feature. I'm removing this qualifier.

> OutputWriterFactory.newInstance shouldn't be private[sql]
> -
>
> Key: SPARK-14875
> URL: https://issues.apache.org/jira/browse/SPARK-14875
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>    Assignee: Cheng Lian
>
> Existing packages like spark-avro need to access 
> {{OutputFactoryWriter.newInstance}}, but it's marked as {{private\[sql\]}} in 
> Spark 2.0. Should make it public again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14875) OutputWriterFactory.newInstance shouldn't be private[sql]

2016-04-23 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255248#comment-15255248
 ] 

Cheng Lian commented on SPARK-14875:


[~marmbrus] Is there any reason why we made it private in Spark 2.0?

> OutputWriterFactory.newInstance shouldn't be private[sql]
> -
>
> Key: SPARK-14875
> URL: https://issues.apache.org/jira/browse/SPARK-14875
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>    Assignee: Cheng Lian
>
> Existing packages like spark-avro need to access 
> {{OutputFactoryWriter.newInstance}}, but it's marked as {{private\[sql\]}} in 
> Spark 2.0. Should make it public again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14875) OutputWriterFactory.newInstance shouldn't be private[sql]

2016-04-23 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14875:
--

 Summary: OutputWriterFactory.newInstance shouldn't be private[sql]
 Key: SPARK-14875
 URL: https://issues.apache.org/jira/browse/SPARK-14875
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Existing packages like spark-avro need to access 
{{OutputFactoryWriter.newInstance}}, but it's marked as {{private\[sql\]}} in 
Spark 2.0. Should make it public again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14843) Error while encoding: java.lang.ClassCastException with LibSVMRelation

2016-04-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14843.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12611
[https://github.com/apache/spark/pull/12611]

> Error while encoding: java.lang.ClassCastException with LibSVMRelation
> --
>
> Key: SPARK-14843
> URL: https://issues.apache.org/jira/browse/SPARK-14843
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib, SQL
>Reporter: Nick Pentreath
> Fix For: 2.0.0
>
>
> While trying to run some example ML linear regression code, I came across the 
> following. In fact this error occurs when doing {{./bin/run-example 
> ml.LinearRegressionWithElasticNetExample}}.
> {code}
> scala> import org.apache.spark.ml.regression.LinearRegression
> import org.apache.spark.ml.regression.LinearRegression
> scala> import org.apache.spark.mllib.linalg.Vector
> import org.apache.spark.mllib.linalg.Vector
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala> val data = 
> sqlContext.read.format("libsvm").load("data/mllib/sample_linear_regression_data.txt")
> data: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val model = lr.fit(data)
> {code}
> Stack trace:
> {code}
> Driver stacktrace:
> ...
>   at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1276)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
>   at org.apache.spark.rdd.RDD.take(RDD.scala:1250)
>   at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1290)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
>   at org.apache.spark.rdd.RDD.first(RDD.scala:1289)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:165)
>   at 
> org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:69)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
>   ... 48 elided
> Caused by: java.lang.RuntimeException: Error while encoding: 
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> org.apache.spark.mllib.linalg.Vector
> if (input[0, org.apache.spark.sql.Row].isNullAt) null else newInstance(class 
> org.apache.spark.mllib.linalg.VectorUDT).serialize
> :- input[0, org.apache.spark.sql.Row].isNullAt
> :  :- input[0, org.apache.spark.sql.Row]
> :  +- 0
> :- null
> +- newInstance(class org.apache.spark.mllib.linalg.VectorUDT).serialize
>:- newInstance(class org.apache.spark.mllib.linalg.VectorUDT)
>+- input[0, org.apache.spark.sql.Row].get
>   :- input[0, org.apache.spark.sql.Row]
>   +- 0
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:230)
>   at 
> org.apache.spark.ml.source.libsvm.DefaultSource$$anonfun$buildReader$1$$anonfun$8.apply(LibSVMRelation.scala:209)
>   at 
> org.apache.spark.ml.source.libsvm.DefaultSource$$anonfun$buildReader$1$$anonfun$8.apply(LibSVMRelation.scala:207)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:90)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegen$$anonfun$7$$anon$1.hasNext(WholeStageCodegen.scala:362)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:254)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>

[jira] [Updated] (SPARK-13928) Move org.apache.spark.Logging into org.apache.spark.internal.Logging

2016-04-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13928:
---
Target Version/s: 2.0.0

> Move org.apache.spark.Logging into org.apache.spark.internal.Logging
> 
>
> Key: SPARK-13928
> URL: https://issues.apache.org/jira/browse/SPARK-13928
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> Logging was made private in Spark 2.0. If we move it, then users would be 
> able to create a Logging trait themselves to avoid changing their own code. 
> Alternatively, we can also provide in a compatibility package that adds 
> logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13928) Move org.apache.spark.Logging into org.apache.spark.internal.Logging

2016-04-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13928:
---
Assignee: Wenchen Fan

> Move org.apache.spark.Logging into org.apache.spark.internal.Logging
> 
>
> Key: SPARK-13928
> URL: https://issues.apache.org/jira/browse/SPARK-13928
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> Logging was made private in Spark 2.0. If we move it, then users would be 
> able to create a Logging trait themselves to avoid changing their own code. 
> Alternatively, we can also provide in a compatibility package that adds 
> logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13928) Move org.apache.spark.Logging into org.apache.spark.internal.Logging

2016-04-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13928:
---
Fix Version/s: 2.0.0

> Move org.apache.spark.Logging into org.apache.spark.internal.Logging
> 
>
> Key: SPARK-13928
> URL: https://issues.apache.org/jira/browse/SPARK-13928
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> Logging was made private in Spark 2.0. If we move it, then users would be 
> able to create a Logging trait themselves to avoid changing their own code. 
> Alternatively, we can also provide in a compatibility package that adds 
> logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: parquet data corruption

2016-04-21 Thread Cheng Lian


(cc dev@parquet.apache.org)

Hey Shushant,

This kind of error can be tricky to debug. Could you please provide the 
following information:


- The tool used to write those Parquet files (possibly Hive 0.13 since 
you mentioned hive-exec 0.13?)
- The tool used to read those Parquet files (should be Hive according to 
the stack trace, but what version?)

- What is the "complex" query?
- Schema of those Parquet files (can be checked using parquet-tools), as 
well as corresponding schema of the user application (table schema for Hive)

- If possible, code snippet you used to write the files
- Are there files of different schemata mixed up? Some tools, like Hive, 
don't handle schema evolution well.


I saw the file name in the stack trace consists of a timestamp. This 
isn't the naming convention used by Hive. Did you move files written 
somewhere else to the target directory?


Cheng

On 4/22/16 10:56 AM, Shushant Arora wrote:

Hi

I am writing to a parquet table 
using parquet.hadoop.ParquetOutputFormat(from hive-exec 0.13).
Data is being written correctly and when I do count(1) or select * 
with limit I get proper result.


But when I do some complex query on table it throws below excpetion :

Diagnostic Messages for this Task:
Error: java.io.IOException: java.io.IOException: 
parquet.io.ParquetDecodingException: Can not read value at 18 in block 
0 in file 
hdfs://nameservice1/user/hive/warehouse/dbname.db/tablename/partitionname/20160421032223.parquet
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:255)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:170)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)

 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.IOException: parquet.io.ParquetDecodingException: 
Can not read value at 18 in block 0 in file

hdfs://nameservice1/user/hive/warehouse/dbname.db/tablename/partitionname/20160421032223.parquet
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:253)

... 11 more
Caused by: parquet.io.ParquetDecodingException: Can not read value at 
18 in block 0 in file

hdfs://nameservice1/user/hive/warehouse/dbname.db/tablename/partitionname/20160421032223.parquet
at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:216)
at 
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:144)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:159)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:48)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)

... 15 more
Caused by: parquet.io.ParquetDecodingException: Can't read value in 
column [sessionid] BINARY at value 18 out of 18, 18 out of 18 in 
currentPage. repetition level: 0, definition level: 1
at 
parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:450)
at 
parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:352)

[jira] [Commented] (SPARK-14463) read.text broken for partitioned tables

2016-04-13 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240640#comment-15240640
 ] 

Cheng Lian commented on SPARK-14463:


Should we simply throw an exception when text data source is used together with 
partitioning?

> read.text broken for partitioned tables
> ---
>
> Key: SPARK-14463
> URL: https://issues.apache.org/jira/browse/SPARK-14463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Priority: Critical
>
> Strongly typing the return values of {{read.text}} as {{Dataset\[String]}} 
> breaks when trying to load a partitioned table (or any table where the path 
> looks partitioned)
> {code}
> Seq((1, "test"))
>   .toDF("a", "b")
>   .write
>   .format("text")
>   .partitionBy("a")
>   .save("/home/michael/text-part-bug")
> sqlContext.read.text("/home/michael/text-part-bug")
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: Try to map struct<value:string,a:int> 
> to Tuple1, but failed as the number of fields does not line up.
>  - Input schema: struct<value:string,a:int>
>  - Target schema: struct;
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.org$apache$spark$sql$catalyst$encoders$ExpressionEncoder$$fail$1(ExpressionEncoder.scala:265)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.validate(ExpressionEncoder.scala:279)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:197)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:168)
>   at org.apache.spark.sql.Dataset$.apply(Dataset.scala:57)
>   at org.apache.spark.sql.Dataset.as(Dataset.scala:357)
>   at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:450)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14389) OOM during BroadcastNestedLoopJoin

2016-04-13 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239687#comment-15239687
 ] 

Cheng Lian commented on SPARK-14389:


Exception thrown by UnsafeRow.copy() inside BNL join doesn't necessarily 
indicate that BNL join ate all the memory. It's possible that other stuff ate 
all the memory, so that BNL join couldn't require enough memory as a victim.


> OOM during BroadcastNestedLoopJoin
> --
>
> Key: SPARK-14389
> URL: https://issues.apache.org/jira/browse/SPARK-14389
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: OS: Amazon Linux AMI 2015.09
> EMR: 4.3.0
> Hadoop: Amazon 2.7.1
> Spark 1.6.0
> Ganglia 3.7.2
> Master: m3.xlarge
> Core: m3.xlarge
> m3.xlarge: 4 CPU, 15GB mem, 2x40GB SSD
>Reporter: Steve Johnston
> Attachments: jps_command_results.txt, lineitem.tbl, plans.txt, 
> sample_script.py, stdout.txt
>
>
> When executing attached sample_script.py in client mode with a single 
> executor an exception occurs, "java.lang.OutOfMemoryError: Java heap space", 
> during the self join of a small table, TPC-H lineitem generated for a 1M 
> dataset. Also see execution log stdout.txt attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14495) Distinct aggregation cannot be used in the having clause

2016-04-13 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239674#comment-15239674
 ] 

Cheng Lian commented on SPARK-14495:


This ticket is for branch-1.6.

> Distinct aggregation cannot be used in the having clause
> 
>
> Key: SPARK-14495
> URL: https://issues.apache.org/jira/browse/SPARK-14495
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Yin Huai
>
> {code}
> select date, count(distinct id)
> from (select '2010-01-01' as date, 1 as id) tmp
> group by date
> having count(distinct id) > 0;
> org.apache.spark.sql.AnalysisException: resolved attribute(s) gid#558,id#559 
> missing from date#554,id#555 in operator !Expand [List(date#554, null, 0, if 
> ((gid#558 = 1)) id#559 else null),List(date#554, id#555, 1, null)], 
> [date#554,id#561,gid#560,if ((gid = 1)) id else null#562];
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:183)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:816)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13753) Column nullable is derived incorrectly

2016-04-13 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239637#comment-15239637
 ] 

Cheng Lian commented on SPARK-13753:


[~jingweilu] Could you please provide the schema of tables involved in the SQL 
query you provided so that we can reproduce this issue more easily? Also, it 
would be greatly helpful if you can help to derive a minimized query that 
reproduces this issue. Thanks!

> Column nullable is derived incorrectly
> --
>
> Key: SPARK-13753
> URL: https://issues.apache.org/jira/browse/SPARK-13753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Jingwei Lu
>Priority: Critical
>
> There is a problem in spark sql to derive nullable column and used in 
> optimization incorrectly. In following query:
> {code}
> select concat("perf.realtime.web", b.tags[1]) as metric, b.value, b.tags[0]
>   from (
> select explode(map(a.frontend[0], 
> ARRAY(concat("metric:frontend", ",controller:", COALESCE(controller, "null"), 
> ",action:", COALESCE(action, "null")), ".p50"),
>  a.frontend[1], 
> ARRAY(concat("metric:frontend", ",controller:", COALESCE(controller, "null"), 
> ",action:", COALESCE(action, "null")), ".p90"),
>  a.backend[0], ARRAY(concat("metric:backend", 
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, 
> "null")), ".p50"),
>  a.backend[1], ARRAY(concat("metric:backend", 
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, 
> "null")), ".p90"),
>  a.render[0], ARRAY(concat("metric:render", 
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, 
> "null")), ".p50"),
>  a.render[1], ARRAY(concat("metric:render", 
> ",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, 
> "null")), ".p90"),
>  a.page_load_time[0], 
> ARRAY(concat("metric:page_load_time", ",controller:", COALESCE(controller, 
> "null"), ",action:", COALESCE(action, "null")), ".p50"),
>  a.page_load_time[1], 
> ARRAY(concat("metric:page_load_time", ",controller:", COALESCE(controller, 
> "null"), ",action:", COALESCE(action, "null")), ".p90"),
>  a.total_load_time[0], 
> ARRAY(concat("metric:total_load_time", ",controller:", COALESCE(controller, 
> "null"), ",action:", COALESCE(action, "null")), ".p50"),
>  a.total_load_time[1], 
> ARRAY(concat("metric:total_load_time", ",controller:", COALESCE(controller, 
> "null"), ",action:", COALESCE(action, "null")), ".p90"))) as (value, tags)
> from (
>   select  data.controller as controller, data.action as 
> action,
>   percentile(data.frontend, array(0.5, 0.9)) as 
> frontend,
>   percentile(data.backend, array(0.5, 0.9)) as 
> backend,
>   percentile(data.render, array(0.5, 0.9)) as render,
>   percentile(data.page_load_time, array(0.5, 0.9)) as 
> page_load_time,
>   percentile(data.total_load_time, array(0.5, 0.9)) 
> as total_load_time
>   from air_events_rt
>   where type='air_events' and data.event_name='pageload'
>   group by data.controller, data.action
> ) a
>   ) b
>   where b.value is not null
> {code}
> b.value is incorrectly derived as not nullable.  "b.value is not null" 
> predicate will be ignored by optimizer which cause the query return incorrect 
> result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14463) read.text broken for partitioned tables

2016-04-13 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239617#comment-15239617
 ] 

Cheng Lian commented on SPARK-14463:


Seems that this is because {{buildReader()}} doesn't append partitioned columns 
like other data sources.

> read.text broken for partitioned tables
> ---
>
> Key: SPARK-14463
> URL: https://issues.apache.org/jira/browse/SPARK-14463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Priority: Critical
>
> Strongly typing the return values of {{read.text}} as {{Dataset\[String]}} 
> breaks when trying to load a partitioned table (or any table where the path 
> looks partitioned)
> {code}
> Seq((1, "test"))
>   .toDF("a", "b")
>   .write
>   .format("text")
>   .partitionBy("a")
>   .save("/home/michael/text-part-bug")
> sqlContext.read.text("/home/michael/text-part-bug")
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: Try to map struct<value:string,a:int> 
> to Tuple1, but failed as the number of fields does not line up.
>  - Input schema: struct<value:string,a:int>
>  - Target schema: struct;
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.org$apache$spark$sql$catalyst$encoders$ExpressionEncoder$$fail$1(ExpressionEncoder.scala:265)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.validate(ExpressionEncoder.scala:279)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:197)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:168)
>   at org.apache.spark.sql.Dataset$.apply(Dataset.scala:57)
>   at org.apache.spark.sql.Dataset.as(Dataset.scala:357)
>   at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:450)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14566) When appending to partitioned persisted table, we should apply a projection over input query plan using existing metastore schema

2016-04-12 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237690#comment-15237690
 ] 

Cheng Lian edited comment on SPARK-14566 at 4/12/16 6:25 PM:
-

This bug is exposed after fixing SPARK-14458.

These two bugs together happened to cheat all our existing test cases.


was (Author: lian cheng):
This bug is exposed after fixing SPARK-14458.

These two bugs together happened to cheated all our existing test cases.

> When appending to partitioned persisted table, we should apply a projection 
> over input query plan using existing metastore schema
> -
>
> Key: SPARK-14566
> URL: https://issues.apache.org/jira/browse/SPARK-14566
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Take the following snippets slightly modified from test case 
> "SQLQuerySuite.SPARK-11453: append data to partitioned table" as an example:
> {code}
> val df1 = Seq("1" -> "10", "2" -> "20").toDF("i", "j")
> df1.write.partitionBy("i").saveAsTable("tbl11453")
> val df2 = Seq("3" -> "30").toDF("i", "j")
> df2.write.mode(SaveMode.Append).partitionBy("i").saveAsTable("tbl11453")
> {code}
> Although {{df1.schema}} is {{<i:STRING, j:STRING>}}, schema of persisted 
> table {{tbl11453}} is actually {{<j:STRING, i:STRING>}} because {{i}} is a 
> partition column, which is always appended after all data columns. Thus, when 
> appending {{df2}}, schemata of {{df2}} and persisted table {{tbl11453}} are 
> actually different.
> In current master branch, {{CreateMetastoreDataSourceAsSelect}} simply 
> applies existing metastore schema to the input query plan ([see 
> here|https://github.com/apache/spark/blob/75e05a5a964c9585dd09a2ef6178881929bab1f1/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala#L225]),
>  which is wrong. A projection should be used instead to adjust column order 
> here.
> In branch-1.6, [this projection is added in 
> {{InsertIntoHadoopFsRelation}}|https://github.com/apache/spark/blob/663a492f0651d757ea8e5aeb42107e2ece429613/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L99-L104],
>  but was removed in Spark 2.0. Replacing the aforementioned line in 
> {{CreateMetastoreDataSourceAsSelect}} with a projection should more 
> preferrable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14566) When appending to partitioned persisted table, we should apply a projection over input query plan using existing metastore schema

2016-04-12 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237690#comment-15237690
 ] 

Cheng Lian commented on SPARK-14566:


This bug is exposed after fixing SPARK-14458.

These two bugs together happened to cheated all our existing test cases.

> When appending to partitioned persisted table, we should apply a projection 
> over input query plan using existing metastore schema
> -
>
> Key: SPARK-14566
> URL: https://issues.apache.org/jira/browse/SPARK-14566
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Take the following snippets slightly modified from test case 
> "SQLQuerySuite.SPARK-11453: append data to partitioned table" as an example:
> {code}
> val df1 = Seq("1" -> "10", "2" -> "20").toDF("i", "j")
> df1.write.partitionBy("i").saveAsTable("tbl11453")
> val df2 = Seq("3" -> "30").toDF("i", "j")
> df2.write.mode(SaveMode.Append).partitionBy("i").saveAsTable("tbl11453")
> {code}
> Although {{df1.schema}} is {{<i:STRING, j:STRING>}}, schema of persisted 
> table {{tbl11453}} is actually {{<j:STRING, i:STRING>}} because {{i}} is a 
> partition column, which is always appended after all data columns. Thus, when 
> appending {{df2}}, schemata of {{df2}} and persisted table {{tbl11453}} are 
> actually different.
> In current master branch, {{CreateMetastoreDataSourceAsSelect}} simply 
> applies existing metastore schema to the input query plan ([see 
> here|https://github.com/apache/spark/blob/75e05a5a964c9585dd09a2ef6178881929bab1f1/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala#L225]),
>  which is wrong. A projection should be used instead to adjust column order 
> here.
> In branch-1.6, [this projection is added in 
> {{InsertIntoHadoopFsRelation}}|https://github.com/apache/spark/blob/663a492f0651d757ea8e5aeb42107e2ece429613/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L99-L104],
>  but was removed in Spark 2.0. Replacing the aforementioned line in 
> {{CreateMetastoreDataSourceAsSelect}} with a projection should more 
> preferrable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14566) When appending to partitioned persisted table, we should apply a projection over input query plan using existing metastore schema

2016-04-12 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14566:
--

 Summary: When appending to partitioned persisted table, we should 
apply a projection over input query plan using existing metastore schema
 Key: SPARK-14566
 URL: https://issues.apache.org/jira/browse/SPARK-14566
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Take the following snippets slightly modified from test case 
"SQLQuerySuite.SPARK-11453: append data to partitioned table" as an example:

{code}
val df1 = Seq("1" -> "10", "2" -> "20").toDF("i", "j")
df1.write.partitionBy("i").saveAsTable("tbl11453")

val df2 = Seq("3" -> "30").toDF("i", "j")
df2.write.mode(SaveMode.Append).partitionBy("i").saveAsTable("tbl11453")
{code}

Although {{df1.schema}} is {{<i:STRING, j:STRING>}}, schema of persisted table 
{{tbl11453}} is actually {{<j:STRING, i:STRING>}} because {{i}} is a partition 
column, which is always appended after all data columns. Thus, when appending 
{{df2}}, schemata of {{df2}} and persisted table {{tbl11453}} are actually 
different.

In current master branch, {{CreateMetastoreDataSourceAsSelect}} simply applies 
existing metastore schema to the input query plan ([see 
here|https://github.com/apache/spark/blob/75e05a5a964c9585dd09a2ef6178881929bab1f1/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala#L225]),
 which is wrong. A projection should be used instead to adjust column order 
here.

In branch-1.6, [this projection is added in 
{{InsertIntoHadoopFsRelation}}|https://github.com/apache/spark/blob/663a492f0651d757ea8e5aeb42107e2ece429613/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L99-L104],
 but was removed in Spark 2.0. Replacing the aforementioned line in 
{{CreateMetastoreDataSourceAsSelect}} with a projection should more preferrable.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14493) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." should always be used with a user defined path

2016-04-12 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14493.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12303
[https://github.com/apache/spark/pull/12303]

> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." should always be used 
> with a user defined path
> ---
>
> Key: SPARK-14493
> URL: https://issues.apache.org/jira/browse/SPARK-14493
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 2.0.0
>
>
> In current Spark 2.0 master, the following DDL command doesn't specify a 
> user-defined path, and writes query result to default Hive warehouse location
> {code}
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> {code}
> In Spark 1.6, it results in the following exception, which is expected 
> behavior:
> {noformat}
> scala> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * 
> FROM x"
> java.util.NoSuchElementException: key not found: path
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

2016-04-12 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14488.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12303
[https://github.com/apache/spark/pull/12303]

> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> 
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 2.0.0
>
>
> The following Spark shell snippet reproduces this bug:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}.
> Explain shows that the physical plan node is {{CreateTableUsingAsSelect}} 
> rather than {{CreateTempTableUsingAsSelect}}.
> {noformat}
> == Parsed Logical Plan ==
> 'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- 'Project [*]
>+- 'UnresolvedRelation `x`, None
> == Analyzed Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Project [id#0L]
>+- SubqueryAlias x
>   +- Range 0, 10, 1, 1, [id#0L]
> == Optimized Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Range 0, 10, 1, 1, [id#0L]
> == Physical Plan ==
> ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
> [Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
> [id#0L]|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14372) Dataset.randomSplit() needs a Java version

2016-04-11 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234779#comment-15234779
 ] 

Cheng Lian commented on SPARK-14372:


Ah, sorry for the late reply. It's already taken by others. (At first I thought 
it was your PR, but later realized that it wasn't.)

> Dataset.randomSplit() needs a Java version
> --
>
> Key: SPARK-14372
> URL: https://issues.apache.org/jira/browse/SPARK-14372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Rekha Joshi
> Fix For: 2.0.0
>
>
> {{Dataset.randomSplit()}} now returns {{Array\[Dataset\[T\]\]}}, which 
> doesn't work for Java users since Java methods can't return generic arrays. 
> We may want something like {{randomSplitAsList()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14372) Dataset.randomSplit() needs a Java version

2016-04-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14372.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12184
[https://github.com/apache/spark/pull/12184]

> Dataset.randomSplit() needs a Java version
> --
>
> Key: SPARK-14372
> URL: https://issues.apache.org/jira/browse/SPARK-14372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Rekha Joshi
> Fix For: 2.0.0
>
>
> {{Dataset.randomSplit()}} now returns {{Array\[Dataset\[T\]\]}}, which 
> doesn't work for Java users since Java methods can't return generic arrays. 
> We may want something like {{randomSplitAsList()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14372) Dataset.randomSplit() needs a Java version

2016-04-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14372:
---
Assignee: Rekha Joshi  (was: Subhobrata Dey)

> Dataset.randomSplit() needs a Java version
> --
>
> Key: SPARK-14372
> URL: https://issues.apache.org/jira/browse/SPARK-14372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Rekha Joshi
>
> {{Dataset.randomSplit()}} now returns {{Array\[Dataset\[T\]\]}}, which 
> doesn't work for Java users since Java methods can't return generic arrays. 
> We may want something like {{randomSplitAsList()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14372) Dataset.randomSplit() needs a Java version

2016-04-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14372:
---
Assignee: Subhobrata Dey

> Dataset.randomSplit() needs a Java version
> --
>
> Key: SPARK-14372
> URL: https://issues.apache.org/jira/browse/SPARK-14372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Subhobrata Dey
>
> {{Dataset.randomSplit()}} now returns {{Array\[Dataset\[T\]\]}}, which 
> doesn't work for Java users since Java methods can't return generic arrays. 
> We may want something like {{randomSplitAsList()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14476) Show table name or path in string of DataSourceScan

2016-04-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian reassigned SPARK-14476:
--

Assignee: Cheng Lian

> Show table name or path in string of DataSourceScan
> ---
>
> Key: SPARK-14476
> URL: https://issues.apache.org/jira/browse/SPARK-14476
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>    Assignee: Cheng Lian
>
> right now, the string of DataSourceScan is only "HadoopFiles xxx", without 
> any information about the table name or path. 
> Since we have that in 1.6, this is kind of regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14493) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." should always be used with a user defined path

2016-04-08 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14493:
--

 Summary: "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." 
should always be used with a user defined path
 Key: SPARK-14493
 URL: https://issues.apache.org/jira/browse/SPARK-14493
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


In current Spark 2.0 master, the following DDL command doesn't specify a 
user-defined path, and writes query result to default Hive warehouse location

{code}
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
{code}

In Spark 1.6, it results in the following exception, which is expected behavior:

{noformat}
scala> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM 
x"
java.util.NoSuchElementException: key not found: path
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232422#comment-15232422
 ] 

Cheng Lian commented on SPARK-14488:


Yea, that's why I came to this DDL command, because this command seems to be 
the only way to trigger {{CreateTempTableUsingAsSelect}}. However, the physical 
plan doesn't use it. Will look into this. Thanks!

> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> 
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet reproduces this bug:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}.
> Explain shows that the physical plan node is {{CreateTableUsingAsSelect}} 
> rather than {{CreateTempTableUsingAsSelect}}.
> {noformat}
> == Parsed Logical Plan ==
> 'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- 'Project [*]
>+- 'UnresolvedRelation `x`, None
> == Analyzed Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Project [id#0L]
>+- SubqueryAlias x
>   +- Range 0, 10, 1, 1, [id#0L]
> == Optimized Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Range 0, 10, 1, 1, [id#0L]
> == Physical Plan ==
> ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
> [Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
> [id#0L]|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232414#comment-15232414
 ] 

Cheng Lian edited comment on SPARK-14488 at 4/8/16 4:17 PM:


Discussed with [~yhuai] offline, and here's the summary:

{{CreateTempTableUsingAsSelect}} existed since 1.3 (I'm surprised that I never 
noticed it!). Its semantics is:

# Execute the {{SELECT}} query.
# Store query result to a user specified position in filesystem. Note that this 
means the {{PATH}} data source option should always be set when using this DDL 
command.
# Create a temporary table using written files.

Basically, it can be used to dump query results to the filesystem without 
creating persisted tables. It's indeed a confusing command and is kinda 
equivalent to the following DDL sequence:

- {{INSERT OVERWRITE DIRECTORY ... STORE AS ... SELECT ...}}
- {{CREATE TEMPORARY TABLE ... USING ... OPTION (PATH ...)}}

However, Spark hasn't implemented {{INSERT OVERWRITE DIRECTORY}} yet. In the 
long run, we should implement it and deprecate this confusing DDL command.

Ticket title and description were updated accordingly.


was (Author: lian cheng):
Discussed with [~yhuai] offline, and here's the summary:

{{CreateTempTableUsingAsSelect}} existed since 1.3 (I'm surprised that I never 
noticed it!). Its semantics is:

# Execute the {{SELECT}} query.
# Store query result to a user specified position in filesystem. Note that this 
means the {{PATH}} data source option should always be set when using this DDL 
command.
# Create a temporary table using written files.

Basically, it can be used to dump query results to the filesystem without 
creating persisted tables. It's indeed a confusing  and is kinda equivalent to 
the following DDL sequence:

- {{INSERT OVERWRITE DIRECTORY ... STORE AS ... SELECT ...}}
- {{CREATE TEMPORARY TABLE ... USING ... OPTION (PATH ...)}}

However, Spark hasn't implemented {{INSERT OVERWRITE DIRECTORY}} yet. In the 
long run, we should implement it and deprecate this confusing DDL command.

Ticket title and description were updated accordingly.

> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> 
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet reproduces this bug:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}.
> Explain shows that the physical plan node is {{CreateTableUsingAsSelect}} 
> rather than {{CreateTempTableUsingAsSelect}}.
> {noformat}
> == Parsed Logical Plan ==
> 'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- 'Project [*]
>+- 'UnresolvedRelation `x`, None
> == Analyzed Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Project [id#0L]
>+- SubqueryAlias x
>   +- Range 0, 10, 1, 1, [id#0L]
> == Optimized Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Range 0, 10, 1, 1, [id#0L]
> == Physical Plan ==
> ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
> [Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
> [id#0L]|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232414#comment-15232414
 ] 

Cheng Lian commented on SPARK-14488:


Discussed with [~yhuai] offline, and here's the summary:

{{CreateTempTableUsingAsSelect}} existed since 1.3 (I'm surprised that I never 
noticed it!). Its semantics is:

# Execute the {{SELECT}} query.
# Store query result to a user specified position in filesystem. Note that this 
means the {{PATH}} data source option should always be set when using this DDL 
command.
# Create a temporary table using written files.

Basically, it can be used to dump query results to the filesystem without 
creating persisted tables. It's indeed a confusing  and is kinda equivalent to 
the following DDL sequence:

- {{INSERT OVERWRITE DIRECTORY ... STORE AS ... SELECT ...}}
- {{CREATE TEMPORARY TABLE ... USING ... OPTION (PATH ...)}}

However, Spark hasn't implemented {{INSERT OVERWRITE DIRECTORY}} yet. In the 
long run, we should implement it and deprecate this confusing DDL command.

Ticket title and description were updated accordingly.

> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> 
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet reproduces this bug:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}.
> Explain shows that the physical plan node is {{CreateTableUsingAsSelect}} 
> rather than {{CreateTempTableUsingAsSelect}}.
> {noformat}
> == Parsed Logical Plan ==
> 'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- 'Project [*]
>+- 'UnresolvedRelation `x`, None
> == Analyzed Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Project [id#0L]
>+- SubqueryAlias x
>   +- Range 0, 10, 1, 1, [id#0L]
> == Optimized Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Range 0, 10, 1, 1, [id#0L]
> == Physical Plan ==
> ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
> [Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
> [id#0L]|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232401#comment-15232401
 ] 

Cheng Lian commented on SPARK-14488:


Ah, sorry, the logical plan class {{CreateTableUsingAsSelect}} uses a boolean 
flag to indicate whether the table is temporary or not, while physical plan 
uses two different classes {{CreateTempTableUsingAsSelect}} and 
{{CreateTableUsingAsSelect}}. Then something is probably wrong in the planner.

> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> 
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet reproduces this bug:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}.
> Explain shows that the physical plan node is {{CreateTableUsingAsSelect}} 
> rather than {{CreateTempTableUsingAsSelect}}.
> {noformat}
> == Parsed Logical Plan ==
> 'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- 'Project [*]
>+- 'UnresolvedRelation `x`, None
> == Analyzed Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Project [id#0L]
>+- SubqueryAlias x
>   +- Range 0, 10, 1, 1, [id#0L]
> == Optimized Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Range 0, 10, 1, 1, [id#0L]
> == Physical Plan ==
> ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
> [Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
> [id#0L]|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

2016-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
---
Description: 
The following Spark shell snippet reproduces this bug:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}.

Explain shows that the physical plan node is {{CreateTableUsingAsSelect}} 
rather than {{CreateTempTableUsingAsSelect}}.

{noformat}
== Parsed Logical Plan ==
'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- 'Project [*]
   +- 'UnresolvedRelation `x`, None

== Analyzed Logical Plan ==

CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- Project [id#0L]
   +- SubqueryAlias x
  +- Range 0, 10, 1, 1, [id#0L]

== Optimized Logical Plan ==
CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- Range 0, 10, 1, 1, [id#0L]

== Physical Plan ==
ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
[Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
[id#0L]|
{noformat}


  was:
The following Spark shell snippet reproduces this bug:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}.

Explain shows that parser probably drops {{TEMPORARY}} while parsing this 
statement:

{noformat}
== Parsed Logical Plan ==
'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- 'Project [*]
   +- 'UnresolvedRelation `x`, None

== Analyzed Logical Plan ==

CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- Project [id#0L]
   +- SubqueryAlias x
  +- Range 0, 10, 1, 1, [id#0L]

== Optimized Logical Plan ==
CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- Range 0, 10, 1, 1, [id#0L]

== Physical Plan ==
ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
[Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
[id#0L]|
{noformat}



> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> 
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet reproduces this bug:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}.
> Explain shows that the physical plan node is {{CreateTableUsingAsSelect}} 
> rather than {{CreateTempTableUsingAsSelect}}.
> {noformat}
> == Parsed Logical Plan ==
> 'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- 'Project [*]
>+- 'UnresolvedRelation `x`, None
> == Analyzed Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Project [id#0L]
>+- SubqueryAlias x
>   +- Range 0, 10, 1, 1, [id#0L]
> == Optimized Logical Plan ==
> CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- Range 0, 10, 1, 1, [id#0L]
> == Physical Plan ==
> Exe

[jira] [Updated] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

2016-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
---
Description: 
The following Spark shell snippet reproduces this bug:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}.

Explain shows that parser probably drops {{TEMPORARY}} while parsing this 
statement:

{noformat}
== Parsed Logical Plan ==
'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- 'Project [*]
   +- 'UnresolvedRelation `x`, None

== Analyzed Logical Plan ==

CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- Project [id#0L]
   +- SubqueryAlias x
  +- Range 0, 10, 1, 1, [id#0L]

== Optimized Logical Plan ==
CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- Range 0, 10, 1, 1, [id#0L]

== Physical Plan ==
ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
[Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
[id#0L]|
{noformat}


  was:
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING }} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one?
# If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
 which exactly maps to this combination?
# If it is, what is the expected semantics?



> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> 
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>    Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet reproduces this bug:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}.
> Explain shows that parser probably drops {{TEMPORARY}} while parsing this 
> statement:
> {noformat}
> == Parsed Logical Plan ==
> 'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
> None, Overwrite, Map()
> +- 'Project [*]
>+- 'UnresolvedRelation `x`, None
> == Analyzed Log

[jira] [Updated] (SPARK-14488) "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table

2016-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
---
Summary: "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates 
persisted table  (was: Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING 
... AS SELECT ...")

> "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..." creates persisted table
> 
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> *Weird behavior*
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}, and the query result is written in Parquet format 
> under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on 
> my local machine.
> *Weird semantics*
> Secondly, even if this DDL statement does create a temporary table, the 
> semantics is still somewhat weird:
> # It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
> instead of loading data from existing files.
> # It has a {{USING }} clause, which is supposed to, I guess, 
> converting the result of the above query into the given format. And by 
> "converting", we have to write out the data into file system.
> # It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
> in-memory temporary table using the files written above?
> The main questions:
> # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a 
> valid one?
> # If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
> command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
>  which exactly maps to this combination?
> # If it is, what is the expected semantics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232092#comment-15232092
 ] 

Cheng Lian edited comment on SPARK-14488 at 4/8/16 12:27 PM:
-

Tried the same snippet using Spark 1.6, and got the following exception, which 
makes sense. I tend to believe that the combination described in the ticket is 
invalid and should be rejected by either parser or analyzer.

{noformat}
scala> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM 
x"
java.util.NoSuchElementException: key not found: path
at scala.collection.MapLike$class.default(MapLike.scala:228)
at 
org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:150)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at 
org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:150)
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:230)
at 
org.apache.spark.sql.execution.datasources.CreateTempTableUsingAsSelect.run(ddl.scala:112)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:35)
at $iwC$$iwC$$iwC$$iwC.(:37)
at $iwC$$iwC$$iwC.(:39)
at $iwC$$iwC.(:41)
at $iwC.(:43)
at (:45)
at .(:49)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccess

[jira] [Updated] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
---
Description: 
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING }} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one?
# If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
 which exactly maps to this combination?
# If it is, what is the expected semantics?


  was:
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING }} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one? If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
 which exactly maps to this combination?
# If it is, what is the expected semantics?



> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> I

[jira] [Commented] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232118#comment-15232118
 ] 

Cheng Lian commented on SPARK-14488:


Result of {{EXPLAIN EXTENDED CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * 
FROM x}}:

{noformat}
== Parsed Logical Plan ==
'CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- 'Project [*]
   +- 'UnresolvedRelation `x`, None

== Analyzed Logical Plan ==

CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- Project [id#0L]
   +- SubqueryAlias x
  +- Range 0, 10, 1, 1, [id#0L]

== Optimized Logical Plan ==
CreateTableUsingAsSelect `y`, PARQUET, true, [Ljava.lang.String;@4d001a14, 
None, Overwrite, Map()
+- Range 0, 10, 1, 1, [id#0L]

== Physical Plan ==
ExecutedCommand CreateMetastoreDataSourceAsSelect `y`, PARQUET, 
[Ljava.lang.String;@4d001a14, None, Overwrite, Map(), Range 0, 10, 1, 1, 
[id#0L]|
{noformat}

So it seems that the parser drops {{TEMPORARY}}.

> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> *Weird behavior*
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}, and the query result is written in Parquet format 
> under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on 
> my local machine.
> *Weird semantics*
> Secondly, even if this DDL statement does create a temporary table, the 
> semantics is still somewhat weird:
> # It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
> instead of loading data from existing files.
> # It has a {{USING }} clause, which is supposed to, I guess, 
> converting the result of the above query into the given format. And by 
> "converting", we have to write out the data into file system.
> # It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
> in-memory temporary table using the files written above?
> The main questions:
> # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a 
> valid one? If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
> command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
>  which exactly maps to this combination?
> # If it is, what is the expected semantics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
---
Description: 
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING }} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one? If it's not, why do we have a [{{CreateTempTableUsingAsSelect}} 
command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
 which exactly maps to this combination?
# If it is, what is the expected semantics?


  was:
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING }} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one?
# If it is, what is the expected semantics?



> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> *Weird behavior*
> Note that {{y}} is N

[jira] [Commented] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232111#comment-15232111
 ] 

Cheng Lian commented on SPARK-14488:


However, if {{TEMPORARY + USING + AS SELECT}} is an invalid combination, why do 
we have a [{{CreateTempTableUsingAsSelect}} 
command|https://github.com/apache/spark/blob/583b5e05309adb73cdffd974a810d6bfb5f2ff95/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L116],
 which exactly maps to this combination?

> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> *Weird behavior*
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}, and the query result is written in Parquet format 
> under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on 
> my local machine.
> *Weird semantics*
> Secondly, even if this DDL statement does create a temporary table, the 
> semantics is still somewhat weird:
> # It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
> instead of loading data from existing files.
> # It has a {{USING }} clause, which is supposed to, I guess, 
> converting the result of the above query into the given format. And by 
> "converting", we have to write out the data into file system.
> # It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
> in-memory temporary table using the files written above?
> The main questions:
> # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a 
> valid one?
> # If it is, what is the expected semantics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232092#comment-15232092
 ] 

Cheng Lian commented on SPARK-14488:


Tried the same snippet using Spark 1.6, and got the following exception, which 
makes sense:
{noformat}
scala> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM 
x"
java.util.NoSuchElementException: key not found: path
at scala.collection.MapLike$class.default(MapLike.scala:228)
at 
org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:150)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at 
org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:150)
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:230)
at 
org.apache.spark.sql.execution.datasources.CreateTempTableUsingAsSelect.run(ddl.scala:112)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:35)
at $iwC$$iwC$$iwC$$iwC.(:37)
at $iwC$$iwC$$iwC.(:39)
at $iwC$$iwC.(:41)
at $iwC.(:43)
at (:45)
at .(:49)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSu

[jira] [Commented] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232078#comment-15232078
 ] 

Cheng Lian commented on SPARK-14488:


Updated title and description of this ticket.

> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> *Weird behavior*
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}, and the query result is written in Parquet format 
> under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on 
> my local machine.
> *Weird semantics*
> Secondly, even if this DDL statement does create a temporary table, the 
> semantics is still somewhat weird:
> # It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
> instead of loading data from existing files.
> # It has a {{USING }} clause, which is supposed to, I guess, 
> converting the result of the above query into the given format. And by 
> "converting", we have to write out the data into file system.
> # It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
> in-memory temporary table using the files written above?
> The main questions:
> # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a 
> valid one?
> # If it is, what is the expected semantics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
---
Description: 
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, which is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING }} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} keyword, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one?
# If it is, what is the expected semantics?


  was:
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, whichi is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING }} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} key word, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one?
# If it is, what is the expected semantics?



> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> *Weird behavior*
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}, and the query result is written in Parquet format 
> under default Hive warehouse location, which is {{/user/hive/warehouse/y}} on 
> my local machine.
> *Weird semantics*
> Secon

[jira] [Updated] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
---
Description: 
Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY TABLE 
... USING ... AS SELECT ...}}, which imposes weird behavior and weird semantics.

Let's try the following Spark shell snippet:

{code}
sqlContext range 10 registerTempTable "x"

// The problematic DDL statement:
sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"

sqlContext.tables().show()
{code}

It shows the following result:

{noformat}
+-+---+
|tableName|isTemporary|
+-+---+
|y|  false|
|x|   true|
+-+---+
{noformat}

*Weird behavior*

Note that {{y}} is NOT temporary although it's created using {{CREATE TEMPORARY 
TABLE ...}}, and the query result is written in Parquet format under default 
Hive warehouse location, whichi is {{/user/hive/warehouse/y}} on my local 
machine.

*Weird semantics*

Secondly, even if this DDL statement does create a temporary table, the 
semantics is still somewhat weird:

# It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
instead of loading data from existing files.
# It has a {{USING }} clause, which is supposed to, I guess, converting 
the result of the above query into the given format. And by "converting", we 
have to write out the data into file system.
# It has a {{TEMPORARY}} key word, which is supposed to, I guess, create an 
in-memory temporary table using the files written above?

The main questions:

# Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a valid 
one?
# If it is, what is the expected semantics?


  was:
The following Spark shell snippet shows that currently temporary table creation 
writes files to file system:

{code}
sqlContext range 10 registerTempTable "t"
sqlContext sql "create temporary table s using parquet as select * from t"
{code}

The problematic code is 
[here|https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L137].


> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>    Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Currently, Spark 2.0 master allows DDL statements like {{CREATE TEMPORARY 
> TABLE ... USING ... AS SELECT ...}}, which imposes weird behavior and weird 
> semantics.
> Let's try the following Spark shell snippet:
> {code}
> sqlContext range 10 registerTempTable "x"
> // The problematic DDL statement:
> sqlContext sql "CREATE TEMPORARY TABLE y USING PARQUET AS SELECT * FROM x"
> sqlContext.tables().show()
> {code}
> It shows the following result:
> {noformat}
> +-+---+
> |tableName|isTemporary|
> +-+---+
> |y|  false|
> |x|   true|
> +-+---+
> {noformat}
> *Weird behavior*
> Note that {{y}} is NOT temporary although it's created using {{CREATE 
> TEMPORARY TABLE ...}}, and the query result is written in Parquet format 
> under default Hive warehouse location, whichi is {{/user/hive/warehouse/y}} 
> on my local machine.
> *Weird semantics*
> Secondly, even if this DDL statement does create a temporary table, the 
> semantics is still somewhat weird:
> # It has a {{AS SELECT ...}} clause, which is supposed to run a given query 
> instead of loading data from existing files.
> # It has a {{USING }} clause, which is supposed to, I guess, 
> converting the result of the above query into the given format. And by 
> "converting", we have to write out the data into file system.
> # It has a {{TEMPORARY}} key word, which is supposed to, I guess, create an 
> in-memory temporary table using the files written above?
> The main questions:
> # Is the above combination ({{TEMPORARY}} + {{USING}} + {{AS SELECT}}) a 
> valid one?
> # If it is, what is the expected semantics?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14488) Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."

2016-04-08 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14488:
---
Summary: Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS 
SELECT ..."  (was: Creating temporary table using SQL DDL shouldn't write files 
to file system)

> Weird behavior of DDL "CREATE TEMPORARY TABLE ... USING ... AS SELECT ..."
> --
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet shows that currently temporary table 
> creation writes files to file system:
> {code}
> sqlContext range 10 registerTempTable "t"
> sqlContext sql "create temporary table s using parquet as select * from t"
> {code}
> The problematic code is 
> [here|https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L137].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14488) Creating temporary table using SQL DDL shouldn't write files to file system

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232030#comment-15232030
 ] 

Cheng Lian edited comment on SPARK-14488 at 4/8/16 11:10 AM:
-

Oh, wait... Since there's a {{USING}} in the DDL statement, are we supposed to 
write the query result using given data source format on disk, and use written 
files to create a temporary table? So basically this DDL is used to save a 
query result using a specific data source format to disk? I find this one quite 
confusing...

cc [~yhuai] [~marmbrus]


was (Author: lian cheng):
Oh, wait... Since there's a {{USING}} in the DDL statement, are we supposed to 
write the query result using given data source format on disk, and use written 
files to create a temporary table? So basically this DDL is used to save a 
query result using a specific data source format to disk? I find this one quite 
confusing...

> Creating temporary table using SQL DDL shouldn't write files to file system
> ---
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet shows that currently temporary table 
> creation writes files to file system:
> {code}
> sqlContext range 10 registerTempTable "t"
> sqlContext sql "create temporary table s using parquet as select * from t"
> {code}
> The problematic code is 
> [here|https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L137].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14488) Creating temporary table using SQL DDL shouldn't write files to file system

2016-04-08 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232030#comment-15232030
 ] 

Cheng Lian commented on SPARK-14488:


Oh, wait... Since there's a {{USING}} in the DDL statement, are we supposed to 
write the query result using given data source format on disk, and use written 
files to create a temporary table? So basically this DDL is used to save a 
query result using a specific data source format to disk? I find this one quite 
confusing...

> Creating temporary table using SQL DDL shouldn't write files to file system
> ---
>
> Key: SPARK-14488
> URL: https://issues.apache.org/jira/browse/SPARK-14488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> The following Spark shell snippet shows that currently temporary table 
> creation writes files to file system:
> {code}
> sqlContext range 10 registerTempTable "t"
> sqlContext sql "create temporary table s using parquet as select * from t"
> {code}
> The problematic code is 
> [here|https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L137].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14488) Creating temporary table using SQL DDL shouldn't write files to file system

2016-04-08 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14488:
--

 Summary: Creating temporary table using SQL DDL shouldn't write 
files to file system
 Key: SPARK-14488
 URL: https://issues.apache.org/jira/browse/SPARK-14488
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


The following Spark shell snippet shows that currently temporary table creation 
writes files to file system:

{code}
sqlContext range 10 registerTempTable "t"
sqlContext sql "create temporary table s using parquet as select * from t"
{code}

The problematic code is 
[here|https://github.com/apache/spark/blob/73b56a3c6c5c590219b42884c8bbe88b0a236987/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala#L137].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14458) Wrong data schema is passed to FileFormat data sources that can't infer schema

2016-04-07 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14458:
--

 Summary: Wrong data schema is passed to FileFormat data sources 
that can't infer schema
 Key: SPARK-14458
 URL: https://issues.apache.org/jira/browse/SPARK-14458
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


When instantiating a {{FileFormat}} data source that is not able to infer its 
schema from data files, {{DataSource}} passes the full schema including 
partition columns to {{HadoopFsRelation}}. We should filter out partition 
columns and only preserve data columns actually live in data files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13589) Flaky test: ParquetHadoopFsRelationSuite.test all data types - ByteType

2016-04-06 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-13589.

Resolution: Resolved

Fixed by SPARK-13537

> Flaky test: ParquetHadoopFsRelationSuite.test all data types - ByteType
> ---
>
> Key: SPARK-13589
> URL: https://issues.apache.org/jira/browse/SPARK-13589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>  Labels: flaky-test
>
> Here are a few sample build failures caused by this test case:
> # 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52164/testReport/org.apache.spark.sql.sources/ParquetHadoopFsRelationSuite/test_all_data_types___ByteType/
> # 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52154/testReport/org.apache.spark.sql.sources/ParquetHadoopFsRelationSuite/test_all_data_types___ByteType/
> # 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52153/testReport/org.apache.spark.sql.sources/ParquetHadoopFsRelationSuite/test_all_data_types___ByteType/
> (I've pinned these builds on Jenkins so that they won't be cleaned up.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14407) Hide HadoopFsRelation related data source API to execution package

2016-04-05 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14407:
--

 Summary: Hide HadoopFsRelation related data source API to 
execution package
 Key: SPARK-14407
 URL: https://issues.apache.org/jira/browse/SPARK-14407
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14404) HDFSMetadataLogSuite overrides Hadoop FileSystem implementation but doesn't recover it

2016-04-05 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14404.

Resolution: Not A Bug

The scheme of {{FakeFileSystem}} is randomly generated, so it doesn't interfere 
normal execution of other tests.

> HDFSMetadataLogSuite overrides Hadoop FileSystem implementation but doesn't 
> recover it
> --
>
> Key: SPARK-14404
> URL: https://issues.apache.org/jira/browse/SPARK-14404
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Test case {{HDFSMetadataLog: fallback from FileContext to FileSystem}} 
> doesn't recover the orignal {{FileSystem}} implementation after overriding it 
> with {{FakeFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14404) HDFSMetadataLogSuite overrides Hadoop FileSystem implementation but doesn't recover it

2016-04-05 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14404:
--

 Summary: HDFSMetadataLogSuite overrides Hadoop FileSystem 
implementation but doesn't recover it
 Key: SPARK-14404
 URL: https://issues.apache.org/jira/browse/SPARK-14404
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Test case {{HDFSMetadataLog: fallback from FileContext to FileSystem}} doesn't 
recover the orignal {{FileSystem}} implementation after overriding it with 
{{FakeFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14404) HDFSMetadataLogSuite overrides Hadoop FileSystem implementation but doesn't recover it

2016-04-05 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14404:
---
Component/s: SQL

> HDFSMetadataLogSuite overrides Hadoop FileSystem implementation but doesn't 
> recover it
> --
>
> Key: SPARK-14404
> URL: https://issues.apache.org/jira/browse/SPARK-14404
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>
> Test case {{HDFSMetadataLog: fallback from FileContext to FileSystem}} 
> doesn't recover the orignal {{FileSystem}} implementation after overriding it 
> with {{FakeFileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14372) Dataset.randomSplit() needs a Java version

2016-04-04 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14372:
--

 Summary: Dataset.randomSplit() needs a Java version
 Key: SPARK-14372
 URL: https://issues.apache.org/jira/browse/SPARK-14372
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian


{{Dataset.randomSplit()}} now returns {{Array\[Dataset\[T\]\]}}, which doesn't 
work for Java users since Java methods can't return generic arrays. We may want 
something like {{randomSplitAsList()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14369) Implement preferredLocations() for FileScanRDD

2016-04-04 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14369:
--

 Summary: Implement preferredLocations() for FileScanRDD
 Key: SPARK-14369
 URL: https://issues.apache.org/jira/browse/SPARK-14369
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Implement {{FileScanRDD.preferredLocations()}} to add locality support for 
{{HadoopFsRelation}} based data sources.

We should avoid extra block location related RPC costs for S3, which doesn't 
provide valid locality information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14295) buildReader implementation for LibSVM

2016-03-31 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14295:
--

 Summary: buildReader implementation for LibSVM
 Key: SPARK-14295
 URL: https://issues.apache.org/jira/browse/SPARK-14295
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14274) Add FileFormat.prepareRead to collect necessary global information

2016-03-30 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14274:
---
Summary: Add FileFormat.prepareRead to collect necessary global information 
 (was: Replaces inferSchema with prepareRead to collect necessary global 
information)

> Add FileFormat.prepareRead to collect necessary global information
> --
>
> Key: SPARK-14274
> URL: https://issues.apache.org/jira/browse/SPARK-14274
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 2.0.0
>
>
> One problem of our newly introduced {{FileFormat.buildReader()}} method is 
> that it only sees pieces of input files. On the other hand, data sources like 
> CSV and LibSVM requires some sort of global information:
> - CSV: the content of the header line if {{header}} option is set to true, so 
> that we can filter out header lines within each input file. This is 
> considered as a global information because it's possible that the header 
> appears in the middle of a file after blocks of comments and empty lines, 
> although this is just a rare/contrived corner case.
> - LibSVM: when {{numFeature}} is not set, we need to scan the whole dataset 
> to infer the total number of features to construct result {{LabeledPoint}} 
> instances.
> Unfortunately, with our current API, this kind of global information can't be 
> gathered.
> The solution proposed here is to add a {{prepareRead}} method, which accepts 
> the same arguments as {{inferSchema}} but returns a {{ReadContext}}, which 
> contains an {{Option\[StructType\]}} for the inferred schema and a 
> {{Map\[String, Any\]}} for any gathered global information. This 
> {{ReadContext}} is then passed to {{buildReader()}}. By default, 
> {{prepareRead}} simply calls {{inferSchema}} (actually the inferred schema 
> itself can be considered as a sort of global information).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14274) Replaces inferSchema with prepareRead to collect necessary global information

2016-03-30 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14274:
---
Description: 
One problem of our newly introduced {{FileFormat.buildReader()}} method is that 
it only sees pieces of input files. On the other hand, data sources like CSV 
and LibSVM requires some sort of global information:

- CSV: the content of the header line if {{header}} option is set to true, so 
that we can filter out header lines within each input file. This is considered 
as a global information because it's possible that the header appears in the 
middle of a file after blocks of comments and empty lines, although this is 
just a rare/contrived corner case.
- LibSVM: when {{numFeature}} is not set, we need to scan the whole dataset to 
infer the total number of features to construct result {{LabeledPoint}} 
instances.

Unfortunately, with our current API, this kind of global information can't be 
gathered.

The solution proposed here is to add a {{prepareRead}} method, which accepts 
the same arguments as {{inferSchema}} but returns a {{ReadContext}}, which 
contains an {{Option\[StructType\]}} for the inferred schema and a 
{{Map\[String, Any\]}} for any gathered global information. This 
{{ReadContext}} is then passed to {{buildReader()}}. By default, 
{{prepareRead}} simply calls {{inferSchema}} (actually the inferred schema 
itself can be considered as a sort of global information).

  was:
One problem of our newly introduced {{FileFormat.buildReader()}} method is that 
it only sees pieces of input files. On the other hand, data sources like CSV 
and LibSVM requires some sort of global information:

- CSV: the content of the header line if {{header}} option is set to true, so 
that we can filter out header lines within each input file. This is considered 
as a global information because it's possible that the header appears in the 
middle of a file after blocks of comments and empty lines, although this is 
just a rare/contrived corner case.
- LibSVM: when {{numFeature}} is not set, we need to scan the whole dataset to 
infer the total number of features to construct result {{LabeledPoint}}s.

Unfortunately, with our current API, this kind of global information can't be 
gathered.

The solution proposed here is to add a {{prepareRead}} method, which accepts 
the same arguments as {{inferSchema}} but returns a {{ReadContext}}, which 
contains an {{Option\[StructType\]}} for the inferred schema and a 
{{Map\[String, Any\]}} for any gathered global information. This 
{{ReadContext}} is then passed to {{buildReader()}}. By default, 
{{prepareRead}} simply calls {{inferSchema}} (actually the inferred schema 
itself can be considered as a sort of global information).


> Replaces inferSchema with prepareRead to collect necessary global information
> -
>
> Key: SPARK-14274
> URL: https://issues.apache.org/jira/browse/SPARK-14274
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 2.0.0
>
>
> One problem of our newly introduced {{FileFormat.buildReader()}} method is 
> that it only sees pieces of input files. On the other hand, data sources like 
> CSV and LibSVM requires some sort of global information:
> - CSV: the content of the header line if {{header}} option is set to true, so 
> that we can filter out header lines within each input file. This is 
> considered as a global information because it's possible that the header 
> appears in the middle of a file after blocks of comments and empty lines, 
> although this is just a rare/contrived corner case.
> - LibSVM: when {{numFeature}} is not set, we need to scan the whole dataset 
> to infer the total number of features to construct result {{LabeledPoint}} 
> instances.
> Unfortunately, with our current API, this kind of global information can't be 
> gathered.
> The solution proposed here is to add a {{prepareRead}} method, which accepts 
> the same arguments as {{inferSchema}} but returns a {{ReadContext}}, which 
> contains an {{Option\[StructType\]}} for the inferred schema and a 
> {{Map\[String, Any\]}} for any gathered global information. This 
> {{ReadContext}} is then passed to {{buildReader()}}. By default, 
> {{prepareRead}} simply calls {{inferSchema}} (actually the inferred schema 
> itself can be considered as a sort of global information).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14274) Replaces inferSchema with prepareRead to collect necessary global information

2016-03-30 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14274:
--

 Summary: Replaces inferSchema with prepareRead to collect 
necessary global information
 Key: SPARK-14274
 URL: https://issues.apache.org/jira/browse/SPARK-14274
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


One problem of our newly introduced {{FileFormat.buildReader()}} method is that 
it only sees pieces of input files. On the other hand, data sources like CSV 
and LibSVM requires some sort of global information:

- CSV: the content of the header line if {{header}} option is set to true, so 
that we can filter out header lines within each input file. This is considered 
as a global information because it's possible that the header appears in the 
middle of a file after blocks of comments and empty lines, although this is 
just a rare/contrived corner case.
- LibSVM: when {{numFeature}} is not set, we need to scan the whole dataset to 
infer the total number of features to construct result {{LabeledPoint}}s.

Unfortunately, with our current API, this kind of global information can't be 
gathered.

The solution proposed here is to add a {{prepareRead}} method, which accepts 
the same arguments as {{inferSchema}} but returns a {{ReadContext}}, which 
contains an {{Option\[StructType\]}} for the inferred schema and a 
{{Map\[String, Any\]}} for any gathered global information. This 
{{ReadContext}} is then passed to {{buildReader()}}. By default, 
{{prepareRead}} simply calls {{inferSchema}} (actually the inferred schema 
itself can be considered as a sort of global information).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14273) Add FileFormat.isSplittable to indicate whether a format is splittable

2016-03-30 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14273:
--

 Summary: Add FileFormat.isSplittable to indicate whether a format 
is splittable
 Key: SPARK-14273
 URL: https://issues.apache.org/jira/browse/SPARK-14273
 Project: Spark
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Cheng Lian


{{FileSourceStrategy}} assumes that all data source formats are splittable and 
always splits data files by fixed partition size. However, not all HDSF based 
formats are splittable. We need a flag to indicate that and ensure that 
non-splittable files won't be split into multiple Spark partitions.

(PS: Is it "splitable" or "splittable"? Probably the latter one? Hadoop uses 
the former one though...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14114) implement buildReader for text data source

2016-03-30 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14114.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11934
[https://github.com/apache/spark/pull/11934]

> implement buildReader for text data source
> --
>
> Key: SPARK-14114
> URL: https://issues.apache.org/jira/browse/SPARK-14114
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14244) Physical Window operator uses global SizeBasedWindowFunction.n attribute generated on both driver and executor side

2016-03-29 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14244:
--

 Summary: Physical Window operator uses global 
SizeBasedWindowFunction.n attribute generated on both driver and executor side
 Key: SPARK-14244
 URL: https://issues.apache.org/jira/browse/SPARK-14244
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1, 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


To reproduce this issue, first start a local cluster with at least one worker. 
Then try the following Spark shell snippet:
{code}
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._

sqlContext.
  range(10).
  select(
'id,
cume_dist() over (Window orderBy 'id) as 'cdist
  ).
  orderBy('cdist).
  show()
{code}
Exception thrown:
{noformat}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 
11, 192.168.1.101): 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: window__partition__size#4
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:92)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:91)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:301)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
at scala.collection.AbstractIterator.to(Iterator.scala:1194)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:350)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:301)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
at scala.collection.AbstractIterator.to(Iterator.scala:1194)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:350)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:248)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:91

[jira] [Resolved] (SPARK-14208) Rename "spark.sql.parquet.fileScan"

2016-03-29 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14208.

Resolution: Fixed

Issue resolved by pull request 12003
[https://github.com/apache/spark/pull/12003]

> Rename "spark.sql.parquet.fileScan"
> ---
>
> Key: SPARK-14208
> URL: https://issues.apache.org/jira/browse/SPARK-14208
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Minor
> Fix For: 2.0.0
>
>
> This option should be renamed since {{FileScanRDD}} is now used by all 
> {{HadoopFsRelation}} based data sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14237) De-duplicate partition value appending logic in various buildReader() implementations

2016-03-29 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14237:
--

 Summary: De-duplicate partition value appending logic in various 
buildReader() implementations
 Key: SPARK-14237
 URL: https://issues.apache.org/jira/browse/SPARK-14237
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Minor


Various data sources share approximately the same code for partition value 
appending. Would be nice to make it a utility method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14206) buildReader implementation for CSV

2016-03-28 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14206:
---
Affects Version/s: 2.0.0
 Target Version/s: 2.0.0
Fix Version/s: (was: 2.0.0)

> buildReader implementation for CSV
> --
>
> Key: SPARK-14206
> URL: https://issues.apache.org/jira/browse/SPARK-14206
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14206) buildReader implementation for CSV

2016-03-28 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian reassigned SPARK-14206:
--

Assignee: Cheng Lian

> buildReader implementation for CSV
> --
>
> Key: SPARK-14206
> URL: https://issues.apache.org/jira/browse/SPARK-14206
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14208) Rename "spark.sql.parquet.fileScan"

2016-03-28 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14208:
--

 Summary: Rename "spark.sql.parquet.fileScan"
 Key: SPARK-14208
 URL: https://issues.apache.org/jira/browse/SPARK-14208
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Minor


This option should be renamed since {{FileScanRDD}} is now used by all 
{{HadoopFsRelation}} based data sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14206) buildReader implementation for CSV

2016-03-28 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14206:
--

 Summary: buildReader implementation for CSV
 Key: SPARK-14206
 URL: https://issues.apache.org/jira/browse/SPARK-14206
 Project: Spark
  Issue Type: Sub-task
Reporter: Cheng Lian






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13456) Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

2016-03-25 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-13456.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11931
[https://github.com/apache/spark/pull/11931]

> Cannot create encoders for case classes defined in Spark shell after 
> upgrading to Scala 2.11
> 
>
> Key: SPARK-13456
> URL: https://issues.apache.org/jira/browse/SPARK-13456
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Spark 2.0 started to use Scala 2.11 by default since [PR 
> #10608|https://github.com/apache/spark/pull/10608].  Unfortunately, after 
> this upgrade, Spark fails to create encoders for case classes defined in REPL:
> {code}
> import sqlContext.implicits._
> case class T(a: Int, b: Double)
> val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
> {code}
> Exception thrown:
> {noformat}
> org.apache.spark.sql.AnalysisException: Unable to generate an encoder for 
> inner class `T` without access to the scope that this class was defined in.
> Try moving this class out of its parent class.;
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:565)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:561)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:333)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer

[jira] [Updated] (SPARK-14146) Imported implicits can't be found in Spark REPL in some cases

2016-03-24 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14146:
---
Affects Version/s: 2.0.0
 Target Version/s: 2.0.0
  Component/s: Spark Core
  Summary: Imported implicits can't be found in Spark REPL in some 
cases  (was: imported implicit can't be found in Spark REPL in some case)

> Imported implicits can't be found in Spark REPL in some cases
> -
>
> Key: SPARK-14146
> URL: https://issues.apache.org/jira/browse/SPARK-14146
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> {code}
> class I(i: Int) {
>   def double: Int = i * 2
> }
> class Context {
>   implicit def toI(i: Int): I = new I(i)
> }
> val c = new Context
> import c._
> // OK
> 1.double
> // Fail
> class A; 1.double
> {code}
> The above code snippets can work in Scala REPL however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14114) implement buildReader for text data source

2016-03-24 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14114:
---
Affects Version/s: 2.0.0
 Target Version/s: 2.0.0
Fix Version/s: (was: 2.0.0)

> implement buildReader for text data source
> --
>
> Key: SPARK-14114
> URL: https://issues.apache.org/jira/browse/SPARK-14114
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14116) buildReader implementation for ORC

2016-03-24 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14116:
--

 Summary: buildReader implementation for ORC
 Key: SPARK-14116
 URL: https://issues.apache.org/jira/browse/SPARK-14116
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13549) Refactor the Optimizer Rule CollapseProject

2016-03-23 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-13549.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11427
[https://github.com/apache/spark/pull/11427]

> Refactor the Optimizer Rule CollapseProject
> ---
>
> Key: SPARK-13549
> URL: https://issues.apache.org/jira/browse/SPARK-13549
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Minor
> Fix For: 2.0.0
>
>
> Duplicate codes exist in CollapseProject.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13549) Refactor the Optimizer Rule CollapseProject

2016-03-23 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13549:
---
Assignee: Xiao Li

> Refactor the Optimizer Rule CollapseProject
> ---
>
> Key: SPARK-13549
> URL: https://issues.apache.org/jira/browse/SPARK-13549
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Minor
>
> Duplicate codes exist in CollapseProject.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13772) DataType mismatch about decimal

2016-03-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-13772.

   Resolution: Fixed
Fix Version/s: 1.6.2

Issue resolved by pull request 11605
[https://github.com/apache/spark/pull/11605]

> DataType mismatch about decimal
> ---
>
> Key: SPARK-13772
> URL: https://issues.apache.org/jira/browse/SPARK-13772
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: spark1.6.0 hadoop2.2.0 jdk1.7.0_79
>Reporter: cen yuhai
>Assignee: cen yuhai
> Fix For: 1.6.2
>
>
> Code snippet to reproduce this issue using 1.6.0:
> {code}
> select if(1=1, cast(1 as double), cast(1.1 as decimal) as a from test
> {code}
> It will throw exceptions like this：
> {noformat}
> Error in query: cannot resolve 'if ((1 = 1)) cast(1 as double) else cast(1.1 
> as decimal(10,0))' due to data type mismatch: differing types in 'if ((1 = 
> 1)) cast(1 as double) else cast(1.1 as decimal(10,0))' (double and 
> decimal(10,0)).; line 1 pos 37
> {noformat}
> I also tested:
> {code}
> select if(1=1,cast(1 as decimal),cast(1 as decimal(19,6))) from test;
> {code}
> {noformat}
> Error in query: cannot resolve 'if ((1 = 1)) cast(1 as decimal(10,0)) else 
> cast(1 as decimal(19,6))' due to data type mismatch: differing types in 'if 
> ((1 = 1)) cast(1 as decimal(10,0)) else cast(1 as decimal(19,6))' 
> (decimal(10,0) and decimal(19,6)).; line 1 pos 38
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13772) DataType mismatch about decimal

2016-03-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13772:
---
Assignee: cen yuhai

> DataType mismatch about decimal
> ---
>
> Key: SPARK-13772
> URL: https://issues.apache.org/jira/browse/SPARK-13772
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: spark1.6.0 hadoop2.2.0 jdk1.7.0_79
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> Code snippet to reproduce this issue using 1.6.0:
> {code}
> select if(1=1, cast(1 as double), cast(1.1 as decimal) as a from test
> {code}
> It will throw exceptions like this：
> {noformat}
> Error in query: cannot resolve 'if ((1 = 1)) cast(1 as double) else cast(1.1 
> as decimal(10,0))' due to data type mismatch: differing types in 'if ((1 = 
> 1)) cast(1 as double) else cast(1.1 as decimal(10,0))' (double and 
> decimal(10,0)).; line 1 pos 37
> {noformat}
> I also tested:
> {code}
> select if(1=1,cast(1 as decimal),cast(1 as decimal(19,6))) from test;
> {code}
> {noformat}
> Error in query: cannot resolve 'if ((1 = 1)) cast(1 as decimal(10,0)) else 
> cast(1 as decimal(19,6))' due to data type mismatch: differing types in 'if 
> ((1 = 1)) cast(1 as decimal(10,0)) else cast(1 as decimal(19,6))' 
> (decimal(10,0) and decimal(19,6)).; line 1 pos 38
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13772) DataType mismatch about decimal

2016-03-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13772:
---
Target Version/s: 1.6.2

> DataType mismatch about decimal
> ---
>
> Key: SPARK-13772
> URL: https://issues.apache.org/jira/browse/SPARK-13772
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: spark1.6.0 hadoop2.2.0 jdk1.7.0_79
>Reporter: cen yuhai
>
> Code snippet to reproduce this issue using 1.6.0:
> {code}
> select if(1=1, cast(1 as double), cast(1.1 as decimal) as a from test
> {code}
> It will throw exceptions like this：
> {noformat}
> Error in query: cannot resolve 'if ((1 = 1)) cast(1 as double) else cast(1.1 
> as decimal(10,0))' due to data type mismatch: differing types in 'if ((1 = 
> 1)) cast(1 as double) else cast(1.1 as decimal(10,0))' (double and 
> decimal(10,0)).; line 1 pos 37
> {noformat}
> I also tested:
> {code}
> select if(1=1,cast(1 as decimal),cast(1 as decimal(19,6))) from test;
> {code}
> {noformat}
> Error in query: cannot resolve 'if ((1 = 1)) cast(1 as decimal(10,0)) else 
> cast(1 as decimal(19,6))' due to data type mismatch: differing types in 'if 
> ((1 = 1)) cast(1 as decimal(10,0)) else cast(1 as decimal(19,6))' 
> (decimal(10,0) and decimal(19,6)).; line 1 pos 38
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13772) DataType mismatch about decimal

2016-03-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13772:
---
Description: 
Code snippet to reproduce this issue using 1.6.0:
{code}
select if(1=1, cast(1 as double), cast(1.1 as decimal) as a from test
{code}
It will throw exceptions like this：
{noformat}
Error in query: cannot resolve 'if ((1 = 1)) cast(1 as double) else cast(1.1 as 
decimal(10,0))' due to data type mismatch: differing types in 'if ((1 = 1)) 
cast(1 as double) else cast(1.1 as decimal(10,0))' (double and decimal(10,0)).; 
line 1 pos 37
{noformat}
I also tested:
{code}
select if(1=1,cast(1 as decimal),cast(1 as decimal(19,6))) from test;
{code}
{noformat}
Error in query: cannot resolve 'if ((1 = 1)) cast(1 as decimal(10,0)) else 
cast(1 as decimal(19,6))' due to data type mismatch: differing types in 'if ((1 
= 1)) cast(1 as decimal(10,0)) else cast(1 as decimal(19,6))' (decimal(10,0) 
and decimal(19,6)).; line 1 pos 38
{noformat}


  was:
I found a bug：
select if(1=1, cast(1 as double), cast(1.1 as decimal) as a from test
It will throw exceptions like this：

Error in query: cannot resolve 'if ((1 = 1)) cast(1 as double) else cast(1.1 as 
decimal(10,0))' due to data type mismatch: differing types in 'if ((1 = 1)) 
cast(1 as double) else cast(1.1 as decimal(10,0))' (double and decimal(10,0)).; 
line 1 pos 37

I also test：
select if(1=1,cast(1 as decimal),cast(1 as decimal(19,6))) from test;

Error in query: cannot resolve 'if ((1 = 1)) cast(1 as decimal(10,0)) else 
cast(1 as decimal(19,6))' due to data type mismatch: differing types in 'if ((1 
= 1)) cast(1 as decimal(10,0)) else cast(1 as decimal(19,6))' (decimal(10,0) 
and decimal(19,6)).; line 1 pos 38


> DataType mismatch about decimal
> ---
>
> Key: SPARK-13772
> URL: https://issues.apache.org/jira/browse/SPARK-13772
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: spark1.6.0 hadoop2.2.0 jdk1.7.0_79
>Reporter: cen yuhai
>
> Code snippet to reproduce this issue using 1.6.0:
> {code}
> select if(1=1, cast(1 as double), cast(1.1 as decimal) as a from test
> {code}
> It will throw exceptions like this：
> {noformat}
> Error in query: cannot resolve 'if ((1 = 1)) cast(1 as double) else cast(1.1 
> as decimal(10,0))' due to data type mismatch: differing types in 'if ((1 = 
> 1)) cast(1 as double) else cast(1.1 as decimal(10,0))' (double and 
> decimal(10,0)).; line 1 pos 37
> {noformat}
> I also tested:
> {code}
> select if(1=1,cast(1 as decimal),cast(1 as decimal(19,6))) from test;
> {code}
> {noformat}
> Error in query: cannot resolve 'if ((1 = 1)) cast(1 as decimal(10,0)) else 
> cast(1 as decimal(19,6))' due to data type mismatch: differing types in 'if 
> ((1 = 1)) cast(1 as decimal(10,0)) else cast(1 as decimal(19,6))' 
> (decimal(10,0) and decimal(19,6)).; line 1 pos 38
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13774) IllegalArgumentException: Can not create a Path from an empty string for incorrect file path

2016-03-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-13774.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11775
[https://github.com/apache/spark/pull/11775]

> IllegalArgumentException: Can not create a Path from an empty string for 
> incorrect file path
> 
>
> Key: SPARK-13774
> URL: https://issues.apache.org/jira/browse/SPARK-13774
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Sunitha Kambhampati
>Priority: Minor
> Fix For: 2.0.0
>
>
> Think the error message should be improved for files that could not be found. 
> The {{Path}} seems given.
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
>   /_/
> Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sqlContext.read.format("csv").load("file-path-is-incorrect.csv")
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:245)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
>   at 
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:976)
>   at 
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:976)
>   at 
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:177)
>   at 
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:177)
>   at scala.Option.map(Option.scala:146)
>   at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:177)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:352)
>   at org.apache.spark.rdd.RDD.take(RDD.scala:1246)
>   at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:352)
>   at org.apache.spark.rdd.RDD.first(RDD.scala:1285)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource.findFirstLine(DefaultSource.scala:156)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource.inferSchema(DefaultSource.scala:58)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$13.apply(DataSource.scala:213)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$13.apply(DataSource.scala:213)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:212)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:131)
>   at org.apache.spark.sql.DataFrameReader.l

[jira] [Updated] (SPARK-13774) IllegalArgumentException: Can not create a Path from an empty string for incorrect file path

2016-03-22 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13774:
---
Assignee: Sunitha Kambhampati

> IllegalArgumentException: Can not create a Path from an empty string for 
> incorrect file path
> 
>
> Key: SPARK-13774
> URL: https://issues.apache.org/jira/browse/SPARK-13774
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Sunitha Kambhampati
>Priority: Minor
>
> Think the error message should be improved for files that could not be found. 
> The {{Path}} seems given.
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
>   /_/
> Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sqlContext.read.format("csv").load("file-path-is-incorrect.csv")
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:245)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
>   at 
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:976)
>   at 
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:976)
>   at 
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:177)
>   at 
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:177)
>   at scala.Option.map(Option.scala:146)
>   at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:177)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:352)
>   at org.apache.spark.rdd.RDD.take(RDD.scala:1246)
>   at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:352)
>   at org.apache.spark.rdd.RDD.first(RDD.scala:1285)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource.findFirstLine(DefaultSource.scala:156)
>   at 
> org.apache.spark.sql.execution.datasources.csv.DefaultSource.inferSchema(DefaultSource.scala:58)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$13.apply(DataSource.scala:213)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$13.apply(DataSource.scala:213)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:212)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:131)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:141)
>   ... 49 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14038) Enable native view by default

2016-03-21 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14038:
---
Description: 
Release note update:
{quote}
Starting from 2.0.0, Spark SQL handles views natively by default. When defining 
a view, now Spark SQL canonicalizes view definition by generating a canonical 
SQL statement from the parsed logical query plan, and then stores it into the 
catalog. If you hit any problems, you may try to turn off native view by 
setting {{spark.sql.nativeView}} to false.
{quote}

> Enable native view by default
> -
>
> Key: SPARK-14038
> URL: https://issues.apache.org/jira/browse/SPARK-14038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>  Labels: releasenotes
>
> Release note update:
> {quote}
> Starting from 2.0.0, Spark SQL handles views natively by default. When 
> defining a view, now Spark SQL canonicalizes view definition by generating a 
> canonical SQL statement from the parsed logical query plan, and then stores 
> it into the catalog. If you hit any problems, you may try to turn off native 
> view by setting {{spark.sql.nativeView}} to false.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14038) Enable native view by default

2016-03-21 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14038:
---
Labels: releasenotes  (was: )

> Enable native view by default
> -
>
> Key: SPARK-14038
> URL: https://issues.apache.org/jira/browse/SPARK-14038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>  Labels: releasenotes
>
> Release note update:
> {quote}
> Starting from 2.0.0, Spark SQL handles views natively by default. When 
> defining a view, now Spark SQL canonicalizes view definition by generating a 
> canonical SQL statement from the parsed logical query plan, and then stores 
> it into the catalog. If you hit any problems, you may try to turn off native 
> view by setting {{spark.sql.nativeView}} to false.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14038) Enable native view by default

2016-03-21 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14038:
---
Affects Version/s: 2.0.0
 Target Version/s: 2.0.0

> Enable native view by default
> -
>
> Key: SPARK-14038
> URL: https://issues.apache.org/jira/browse/SPARK-14038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14038) Enable native view by default

2016-03-21 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14038:
---
Assignee: Wenchen Fan

> Enable native view by default
> -
>
> Key: SPARK-14038
> URL: https://issues.apache.org/jira/browse/SPARK-14038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14000) case class with a tuple field can't work in Dataset

2016-03-21 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14000.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11816
[https://github.com/apache/spark/pull/11816]

> case class with a tuple field can't work in Dataset
> ---
>
> Key: SPARK-14000
> URL: https://issues.apache.org/jira/browse/SPARK-14000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Wenchen Fan
> Fix For: 2.0.0
>
>
> for example, `case class TupleClass(data: (Int, String))`, we can create 
> encoder for it, but when we create Dataset with it, we will fail while 
> validating the encoder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14004) AttributeReference and Alias should only use their first qualifier to build SQL representations

2016-03-20 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14004.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11820
[https://github.com/apache/spark/pull/11820]

> AttributeReference and Alias should only use their first qualifier to build 
> SQL representations
> ---
>
> Key: SPARK-14004
> URL: https://issues.apache.org/jira/browse/SPARK-14004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Minor
> Fix For: 2.0.0
>
>
> Current implementation joins all qualifiers, which is wrong.
> However, this doesn't cause any real SQL generation bugs as there is always 
> at most one qualifier for any given {{AttributeReference}} or {{Alias}}.
> We can probably use {{Option\[String\]}} instead of {{Seq\[String\]}} to 
> represent qualifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13972) hive tests should fail if SQL generation failed

2016-03-19 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13972:
---
Assignee: Wenchen Fan

> hive tests should fail if SQL generation failed
> ---
>
> Key: SPARK-13972
> URL: https://issues.apache.org/jira/browse/SPARK-13972
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14001) support multi-children Union in SQLBuilder

2016-03-19 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14001:
---
Assignee: Wenchen Fan

> support multi-children Union in SQLBuilder
> --
>
> Key: SPARK-14001
> URL: https://issues.apache.org/jira/browse/SPARK-14001
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12719) SQL generation support for generators (including UDTF)

2016-03-19 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-12719:
---
Assignee: Wenchen Fan

> SQL generation support for generators (including UDTF)
> --
>
> Key: SPARK-12719
> URL: https://issues.apache.org/jira/browse/SPARK-12719
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Wenchen Fan
>
> {{HiveCompatibilitySuite}} can be useful for bootstrapping test coverage. 
> Please refer to SPARK-11012 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14002) SQLBuilder should add subquery to Aggregate child when necessary

2016-03-19 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14002.

   Resolution: Duplicate
Fix Version/s: 2.0.0

This issue is actually covered by SPARK-13976.

> SQLBuilder should add subquery to Aggregate child when necessary
> 
>
> Key: SPARK-14002
> URL: https://issues.apache.org/jira/browse/SPARK-14002
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>    Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 2.0.0
>
>
> Adding the following test case to {{LogicalPlanToSQLSuite}} to reproduce this 
> issue:
> {code}
>   test("bug") {
> checkHiveQl(
>   """SELECT COUNT(id)
> |FROM
> |(
> |  SELECT id FROM t0
> |) subq
>   """.stripMargin
> )
>   }
> {code}
> Generated wrong SQL is:
> {code:sql}
> SELECT `gen_attr_46` AS `count(id)`
> FROM
> (
> SELECT count(`gen_attr_45`) AS `gen_attr_46`
> FROM
> SELECT `gen_attr_45`-- 
> FROM--
> (   -- A subquery
> SELECT `id` AS `gen_attr_45`-- is missing
> FROM `default`.`t0` --
> ) AS gen_subquery_0 --
> ) AS gen_subquery_1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13974) sub-query names do not need to be globally unique while generate SQL

2016-03-19 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-13974:
---
Assignee: Wenchen Fan

> sub-query names do not need to be globally unique while generate SQL
> 
>
> Key: SPARK-13974
> URL: https://issues.apache.org/jira/browse/SPARK-13974
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14004) AttributeReference and Alias should only use their first qualifier to build SQL representations

2016-03-19 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14004:
---
Priority: Minor  (was: Major)

> AttributeReference and Alias should only use their first qualifier to build 
> SQL representations
> ---
>
> Key: SPARK-14004
> URL: https://issues.apache.org/jira/browse/SPARK-14004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Minor
>
> Current implementation joins all qualifiers, which is wrong.
> However, this doesn't cause any real SQL generation bugs as there is always 
> at most one qualifier for any given {{AttributeReference}} or {{Alias}}.
> We can probably use {{Option\[String\]}} instead of {{Seq\[String\]}} to 
> represent qualifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14004) AttributeReference and Alias should only use their first qualifier to build SQL representations

2016-03-19 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14004:
--

 Summary: AttributeReference and Alias should only use their first 
qualifier to build SQL representations
 Key: SPARK-14004
 URL: https://issues.apache.org/jira/browse/SPARK-14004
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Current implementation joins all qualifiers, which is wrong.

However, this doesn't cause any real SQL generation bugs as there is always at 
most one qualifier for any given {{AttributeReference}} or {{Alias}}.

We can probably use {{Option\[String\]}} instead of {{Seq\[String\]}} to 
represent qualifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14002) SQLBuilder should add subquery to Aggregate child when necessary

2016-03-19 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-14002:
--

 Summary: SQLBuilder should add subquery to Aggregate child when 
necessary
 Key: SPARK-14002
 URL: https://issues.apache.org/jira/browse/SPARK-14002
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Cheng Lian
Assignee: Cheng Lian


Adding the following test case to {{LogicalPlanToSQLSuite}} to reproduce this 
issue:
{code}
  test("bug") {
checkHiveQl(
  """SELECT COUNT(id)
|FROM
|(
|  SELECT id FROM t0
|) subq
  """.stripMargin
)
  }
{code}
Generated wrong SQL is:
{code:sql}
SELECT `gen_attr_46` AS `count(id)`
FROM
(
SELECT count(`gen_attr_45`) AS `gen_attr_46`
FROM
SELECT `gen_attr_45`-- 
FROM--
(   -- A subquery
SELECT `id` AS `gen_attr_45`-- is missing
FROM `default`.`t0` --
) AS gen_subquery_0 --
) AS gen_subquery_1
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 2524 matches

Mail list logo