from:"Adrian Wang \(Jira\)"

[jira] [Created] (SPARK-46714) Overwrite partitions with custom location should reset partition locations

2024-01-14 Thread Adrian Wang (Jira)

Adrian Wang created SPARK-46714:
---

 Summary: Overwrite partitions with custom location should reset 
partition locations
 Key: SPARK-46714
 URL: https://issues.apache.org/jira/browse/SPARK-46714
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Adrian Wang


In hive metastore we support partitions to be outside of corresponding table 
location.

When overwrite such partition with hive, the overwritten partitions should be 
recreated under table location.

Also, currently if a partition is on a different fileystem from the table, when 
overwriting spark will throw exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41816) Spark ThriftServer should not close file system when log out

2023-01-07 Thread Adrian Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang resolved SPARK-41816.
-
Resolution: Invalid

> Spark ThriftServer should not close file system when log out
> 
>
> Key: SPARK-41816
> URL: https://issues.apache.org/jira/browse/SPARK-41816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Adrian Wang
>Priority: Major
>
> Currently when enabled impersonation, Spark Thriftserver will close 
> filesystem instance for the user when logout. If there are two sessions with 
> the same user, the remaining session will become corrupted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41816) Spark ThriftServer should not close file system when log out

2023-01-02 Thread Adrian Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-41816:

Description: Currently when enable impersonation, Spark Thriftserver will 
close filesystem instance for the user when logout. If there are two sessions 
with the same user, the remaining session will become corrupted.  (was: 
Currently when enable impersonation, Spark Thriftserver will close filesystem 
instance for the user. If there are two sessions with the same user, the 
remaining session will become corrupted.)

> Spark ThriftServer should not close file system when log out
> 
>
> Key: SPARK-41816
> URL: https://issues.apache.org/jira/browse/SPARK-41816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Adrian Wang
>Priority: Major
>
> Currently when enable impersonation, Spark Thriftserver will close filesystem 
> instance for the user when logout. If there are two sessions with the same 
> user, the remaining session will become corrupted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41816) Spark ThriftServer should not close file system when log out

2023-01-02 Thread Adrian Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-41816:

Description: Currently when enabled impersonation, Spark Thriftserver will 
close filesystem instance for the user when logout. If there are two sessions 
with the same user, the remaining session will become corrupted.  (was: 
Currently when enable impersonation, Spark Thriftserver will close filesystem 
instance for the user when logout. If there are two sessions with the same 
user, the remaining session will become corrupted.)

> Spark ThriftServer should not close file system when log out
> 
>
> Key: SPARK-41816
> URL: https://issues.apache.org/jira/browse/SPARK-41816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Adrian Wang
>Priority: Major
>
> Currently when enabled impersonation, Spark Thriftserver will close 
> filesystem instance for the user when logout. If there are two sessions with 
> the same user, the remaining session will become corrupted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41816) Spark ThriftServer should not close file system when log out

2023-01-02 Thread Adrian Wang (Jira)

Adrian Wang created SPARK-41816:
---

 Summary: Spark ThriftServer should not close file system when log 
out
 Key: SPARK-41816
 URL: https://issues.apache.org/jira/browse/SPARK-41816
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.1
Reporter: Adrian Wang


Currently when enable impersonation, Spark Thriftserver will close filesystem 
instance for the user. If there are two sessions with the same user, the 
remaining session will become corrupted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26764) [SPIP] Spark Relational Cache

2021-06-28 Thread Adrian Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370993#comment-17370993
 ] 

Adrian Wang commented on SPARK-26764:
-

[~zshao] Thanks for the interest. We created an open-source plugin: 
[https://github.com/alibaba/SparkCube], to demonstrate the basic ideas.

> [SPIP] Spark Relational Cache
> -
>
> Key: SPARK-26764
> URL: https://issues.apache.org/jira/browse/SPARK-26764
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Adrian Wang
>Priority: Major
> Attachments: Relational+Cache+SPIP.pdf
>
>
> In modern database systems, relational cache is a common technology to boost 
> ad-hoc queries. While Spark provides cache natively, Spark SQL should be able 
> to utilize the relationship between relations to boost all possible queries. 
> In this SPIP, we will make Spark be able to utilize all defined cached 
> relations if possible, without explicit substitution in user query, as well 
> as keep some user defined cache available in different sessions. Materialized 
> views in many database systems provide similar function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions

2021-05-16 Thread Adrian Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345730#comment-17345730
 ] 

Adrian Wang commented on SPARK-30130:
-

I also met this on 2.4.7, and this has been fix on master/3.1.

> Hardcoded numeric values in common table expressions which utilize GROUP BY 
> are interpreted as ordinal positions
> 
>
> Key: SPARK-30130
> URL: https://issues.apache.org/jira/browse/SPARK-30130
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Matt Boegner
>Priority: Minor
>
> Hardcoded numeric values in common table expressions which utilize GROUP BY 
> are interpreted as ordinal positions.
> {code:java}
> val df = spark.sql("""
>  with a as (select 0 as test, count(*) group by test)
>  select * from a
>  """)
>  df.show(){code}
> This results in an error message like {color:#e01e5a}GROUP BY position 0 is 
> not in select list (valid range is [1, 2]){color} .
>  
> However, this error does not appear in a traditional subselect format. For 
> example, this query executes correctly:
> {code:java}
> val df = spark.sql("""
>  select * from (select 0 as test, count(*) group by test) a
>  """)
>  df.show(){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35238) Add JindoFS SDK in cloud integration documents

2021-04-27 Thread Adrian Wang (Jira)

Adrian Wang created SPARK-35238:
---

 Summary: Add JindoFS SDK in cloud integration documents
 Key: SPARK-35238
 URL: https://issues.apache.org/jira/browse/SPARK-35238
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.1.1, 3.0.2, 2.4.7, 2.3.4
Reporter: Adrian Wang


As an important cloud provider, Alibaba Cloud presents JindoFS SDK to maximize 
the performance for workloads interacting with Alibaba Cloud OSS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31595) Spark sql cli should allow unescaped quote mark in quoted string

2020-04-28 Thread Adrian Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094980#comment-17094980
 ] 

Adrian Wang commented on SPARK-31595:
-

[~Ankitraj] Thanks, I have already created a pull request on this.

> Spark sql cli should allow unescaped quote mark in quoted string
> 
>
> Key: SPARK-31595
> URL: https://issues.apache.org/jira/browse/SPARK-31595
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Adrian Wang
>Priority: Major
>
> spark-sql> select "'";
> spark-sql> select '"';
> In Spark parser if we pass a text of `select "'";`, there will be 
> ParserCancellationException, which will be handled by PredictionMode.LL. By 
> dropping `;` correctly we can avoid that retry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31595) Spark sql cli should allow unescaped quote mark in quoted string

2020-04-28 Thread Adrian Wang (Jira)

Adrian Wang created SPARK-31595:
---

 Summary: Spark sql cli should allow unescaped quote mark in quoted 
string
 Key: SPARK-31595
 URL: https://issues.apache.org/jira/browse/SPARK-31595
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Adrian Wang


spark-sql> select "'";
spark-sql> select '"';

In Spark parser if we pass a text of `select "'";`, there will be 
ParserCancellationException, which will be handled by PredictionMode.LL. By 
dropping `;` correctly we can avoid that retry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29177) Zombie tasks prevents executor from releasing when task exceeds maxResultSize

2019-09-19 Thread Adrian Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-29177:

Description: When we fetch results from executors and found the total size 
has exceeded the maxResultSize configured, Spark will simply abort the stage 
and all dependent jobs. But the task triggered this is actually successful, but 
never post out `TaskEnd` event, as a result it will never be removed from 
`CoarseGrainedSchedulerBackend`. If dynamic allocation is enabled, there will 
be zombie executor(s) remaining in resource manager, it will never die until 
application ends.  (was: When we fetch results from executors and found the 
total size has exceeded the maxResultSize configured, Spark will simply abort 
the stage and all dependent jobs. But the task triggered this is actually 
successful, but never posted `CompletionEvent` out, as a result it will never 
be removed from `CoarseGrainedSchedulerBackend`. If dynamic allocation is 
enabled, there will be zombie executor(s) remaining in resource manager, it 
will never die until application ends.)

> Zombie tasks prevents executor from releasing when task exceeds maxResultSize
> -
>
> Key: SPARK-29177
> URL: https://issues.apache.org/jira/browse/SPARK-29177
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.4
>Reporter: Adrian Wang
>Priority: Major
>
> When we fetch results from executors and found the total size has exceeded 
> the maxResultSize configured, Spark will simply abort the stage and all 
> dependent jobs. But the task triggered this is actually successful, but never 
> post out `TaskEnd` event, as a result it will never be removed from 
> `CoarseGrainedSchedulerBackend`. If dynamic allocation is enabled, there will 
> be zombie executor(s) remaining in resource manager, it will never die until 
> application ends.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29177) Zombie tasks prevents executor from releasing when task exceeds maxResultSize

2019-09-19 Thread Adrian Wang (Jira)

Adrian Wang created SPARK-29177:
---

 Summary: Zombie tasks prevents executor from releasing when task 
exceeds maxResultSize
 Key: SPARK-29177
 URL: https://issues.apache.org/jira/browse/SPARK-29177
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4, 2.3.4
Reporter: Adrian Wang


When we fetch results from executors and found the total size has exceeded the 
maxResultSize configured, Spark will simply abort the stage and all dependent 
jobs. But the task triggered this is actually successful, but never posted 
`CompletionEvent` out, as a result it will never be removed from 
`CoarseGrainedSchedulerBackend`. If dynamic allocation is enabled, there will 
be zombie executor(s) remaining in resource manager, it will never die until 
application ends.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13446) Spark need to support reading data from Hive 2.0.0 metastore

2019-09-16 Thread Adrian Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931057#comment-16931057
 ] 

Adrian Wang commented on SPARK-13446:
-

or you can just apply the patch from SPARK-27349 and recompile your spark. Hope 
it works!

> Spark need to support reading data from Hive 2.0.0 metastore
> 
>
> Key: SPARK-13446
> URL: https://issues.apache.org/jira/browse/SPARK-13446
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Lifeng Wang
>Assignee: Xiao Li
>Priority: Major
> Fix For: 2.2.0
>
>
> Spark provided HIveContext class to read data from hive metastore directly. 
> While it only supports hive 1.2.1 version and older. Since hive 2.0.0 has 
> released, it's better to upgrade to support Hive 2.0.0.
> {noformat}
> 16/02/23 02:35:02 INFO metastore: Trying to connect to metastore with URI 
> thrift://hsw-node13:9083
> 16/02/23 02:35:02 INFO metastore: Opened a connection to metastore, current 
> connections: 1
> 16/02/23 02:35:02 INFO metastore: Connected to metastore.
> Exception in thread "main" java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
> at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:192)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
> at 
> org.apache.spark.sql.hive.HiveContext$$anon$1.(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:421)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:72)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:739)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:735)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13446) Spark need to support reading data from Hive 2.0.0 metastore

2019-09-16 Thread Adrian Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931055#comment-16931055
 ] 

Adrian Wang commented on SPARK-13446:
-

[~jpbordi][~headcra6] I am using mysql as hive metastore backend, leaving 
1.2.1hive jar in my spark/jars directory, without putting any additional hive 
jars in there, and reading from hive 2.x metastore service, it just works fine.

```
hive-beeline-1.2.1.spark2.jar  hive-cli-1.2.1.spark2.jar  
hive-exec-1.2.1.spark2.jar  hive-jdbc-1.2.1.spark2.jar  
hive-metastore-1.2.1.spark2.jar
```

That what it returns with `ls $SPARK_HOME/jars/hive-*`


> Spark need to support reading data from Hive 2.0.0 metastore
> 
>
> Key: SPARK-13446
> URL: https://issues.apache.org/jira/browse/SPARK-13446
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Lifeng Wang
>Assignee: Xiao Li
>Priority: Major
> Fix For: 2.2.0
>
>
> Spark provided HIveContext class to read data from hive metastore directly. 
> While it only supports hive 1.2.1 version and older. Since hive 2.0.0 has 
> released, it's better to upgrade to support Hive 2.0.0.
> {noformat}
> 16/02/23 02:35:02 INFO metastore: Trying to connect to metastore with URI 
> thrift://hsw-node13:9083
> 16/02/23 02:35:02 INFO metastore: Opened a connection to metastore, current 
> connections: 1
> 16/02/23 02:35:02 INFO metastore: Connected to metastore.
> Exception in thread "main" java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
> at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:192)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
> at 
> org.apache.spark.sql.hive.HiveContext$$anon$1.(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:421)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:72)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:739)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:735)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13446) Spark need to support reading data from Hive 2.0.0 metastore

2019-09-16 Thread Adrian Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930298#comment-16930298
 ] 

Adrian Wang commented on SPARK-13446:
-

[~elgalu][~toopt4][~headcra6][~jpbordi][~F7753] The reference to this variable 
has been removed in SPARK-27349 , which will be included in SPARK 3.0. For 
spark 2.x, you should exclude hive-exec.jar of hive 2.x or above from your 
spark extra class path, so you can avoid this exception.

> Spark need to support reading data from Hive 2.0.0 metastore
> 
>
> Key: SPARK-13446
> URL: https://issues.apache.org/jira/browse/SPARK-13446
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Lifeng Wang
>Assignee: Xiao Li
>Priority: Major
> Fix For: 2.2.0
>
>
> Spark provided HIveContext class to read data from hive metastore directly. 
> While it only supports hive 1.2.1 version and older. Since hive 2.0.0 has 
> released, it's better to upgrade to support Hive 2.0.0.
> {noformat}
> 16/02/23 02:35:02 INFO metastore: Trying to connect to metastore with URI 
> thrift://hsw-node13:9083
> 16/02/23 02:35:02 INFO metastore: Opened a connection to metastore, current 
> connections: 1
> 16/02/23 02:35:02 INFO metastore: Connected to metastore.
> Exception in thread "main" java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
> at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:192)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
> at 
> org.apache.spark.sql.hive.HiveContext$$anon$1.(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:421)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:72)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:739)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:735)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View

2019-09-11 Thread Adrian Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928100#comment-16928100
 ] 

Adrian Wang commented on SPARK-29038:
-

This seems duplicates with our proposal of SPARK-26764 . We have implemented 
similar features and have already had it running in our customer's production 
environment.

> SPIP: Support Spark Materialized View
> -
>
> Key: SPARK-29038
> URL: https://issues.apache.org/jira/browse/SPARK-29038
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Materialized view is an important approach in DBMS to cache data to 
> accelerate queries. By creating a materialized view through SQL, the data 
> that can be cached is very flexible, and needs to be configured arbitrarily 
> according to specific usage scenarios. The Materialization Manager 
> automatically updates the cache data according to changes in detail source 
> tables, simplifying user work. When user submit query, Spark optimizer 
> rewrites the execution plan based on the available materialized view to 
> determine the optimal execution plan.
> Details in [design 
> doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27279) Reuse subquery should compare child node only

2019-03-25 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-27279:
---

 Summary: Reuse subquery should compare child node only
 Key: SPARK-27279
 URL: https://issues.apache.org/jira/browse/SPARK-27279
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Adrian Wang


For now, `ReuseSubquery` in Spark compares two subqueries at `SubqueryExec` 
level, which invalidates the `ReuseSubquery` rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2019-03-22 Thread Adrian Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-22601:

Fix Version/s: (was: 2.2.1)
   2.2.2

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith Chacko
>Assignee: Sujith Chacko
>Priority: Minor
> Fix For: 2.2.2
>
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26764) [SPIP] Spark Relational Cache

2019-02-27 Thread Adrian Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779170#comment-16779170
 ] 

Adrian Wang commented on SPARK-26764:
-

Hi [~Tagar] , the idea has something common with materialized view, while we 
would also make query rewriting available for Spark's cached query, and the 
data materialization process will be more configurable.

> [SPIP] Spark Relational Cache
> -
>
> Key: SPARK-26764
> URL: https://issues.apache.org/jira/browse/SPARK-26764
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Adrian Wang
>Priority: Major
> Attachments: Relational+Cache+SPIP.pdf
>
>
> In modern database systems, relational cache is a common technology to boost 
> ad-hoc queries. While Spark provides cache natively, Spark SQL should be able 
> to utilize the relationship between relations to boost all possible queries. 
> In this SPIP, we will make Spark be able to utilize all defined cached 
> relations if possible, without explicit substitution in user query, as well 
> as keep some user defined cache available in different sessions. Materialized 
> views in many database systems provide similar function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26764) [SPIP] Spark Relational Cache

2019-01-29 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-26764:
---

 Summary: [SPIP] Spark Relational Cache
 Key: SPARK-26764
 URL: https://issues.apache.org/jira/browse/SPARK-26764
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.0
Reporter: Adrian Wang
 Attachments: Relational+Cache+SPIP.pdf

In modern database systems, relational cache is a common technology to boost 
ad-hoc queries. While Spark provides cache natively, Spark SQL should be able 
to utilize the relationship between relations to boost all possible queries. In 
this SPIP, we will make Spark be able to utilize all defined cached relations 
if possible, without explicit substitution in user query, as well as keep some 
user defined cache available in different sessions. Materialized views in many 
database systems provide similar function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26764) [SPIP] Spark Relational Cache

2019-01-29 Thread Adrian Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-26764:

Attachment: Relational+Cache+SPIP.pdf

> [SPIP] Spark Relational Cache
> -
>
> Key: SPARK-26764
> URL: https://issues.apache.org/jira/browse/SPARK-26764
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Adrian Wang
>Priority: Major
> Attachments: Relational+Cache+SPIP.pdf
>
>
> In modern database systems, relational cache is a common technology to boost 
> ad-hoc queries. While Spark provides cache natively, Spark SQL should be able 
> to utilize the relationship between relations to boost all possible queries. 
> In this SPIP, we will make Spark be able to utilize all defined cached 
> relations if possible, without explicit substitution in user query, as well 
> as keep some user defined cache available in different sessions. Materialized 
> views in many database systems provide similar function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-27 Thread Adrian Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700836#comment-16700836
 ] 

Adrian Wang commented on SPARK-26155:
-

[~Jk_Self] can you also test this on Spark 2.4?

> Spark SQL  performance degradation after apply SPARK-21052 with Q19 of TPC-DS 
> in 3TB scale
> --
>
> Key: SPARK-26155
> URL: https://issues.apache.org/jira/browse/SPARK-26155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
> Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis 
> in Spark2.3 without L486 & 487.pdf, q19.sql
>
>
> In our test environment, we found a serious performance degradation issue in 
> Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious 
> performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark 
> 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated 
> this problem and figured out the root cause is in community patch SPARK-21052 
> which add metrics to hash join process. And the impact code is 
> [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486]
>  and 
> [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487]
>   . Q19 costs about 30 seconds without these two lines code and 126 seconds 
> with these code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-27 Thread Adrian Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700253#comment-16700253
 ] 

Adrian Wang commented on SPARK-26155:
-

[~viirya] , thanks for your reply. [~Jk_Self] initially found this when 
comparing Spark 2.1 and Spark 2.3, and after a binary search against the commit 
tree, she found the difference is caused by SPARK-21052 . Finally she remove 
the 2 lines from Spark 2.3 source code and recompiled, the performance 
regression is gone.

> Spark SQL  performance degradation after apply SPARK-21052 with Q19 of TPC-DS 
> in 3TB scale
> --
>
> Key: SPARK-26155
> URL: https://issues.apache.org/jira/browse/SPARK-26155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
> Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis 
> in Spark2.3 without L486 & 487.pdf, q19.sql
>
>
> In our test environment, we found a serious performance degradation issue in 
> Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious 
> performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark 
> 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated 
> this problem and figured out the root cause is in community patch SPARK-21052 
> which add metrics to hash join process. And the impact code is 
> [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486]
>  and 
> [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487]
>   . Q19 costs about 30 seconds without these two lines code and 126 seconds 
> with these code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26181) the `hasMinMaxStats` method of `ColumnStatsMap` is not correct

2018-11-26 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-26181:
---

 Summary: the `hasMinMaxStats` method of `ColumnStatsMap` is not 
correct
 Key: SPARK-26181
 URL: https://issues.apache.org/jira/browse/SPARK-26181
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Adrian Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-23 Thread Adrian Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696541#comment-16696541
 ] 

Adrian Wang commented on SPARK-26155:
-

It seems the performance downgrade is related to CPU cache, the metrics 
collection happens to break that...

> Spark SQL  performance degradation after apply SPARK-21052 with Q19 of TPC-DS 
> in 3TB scale
> --
>
> Key: SPARK-26155
> URL: https://issues.apache.org/jira/browse/SPARK-26155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
> Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis 
> in Spark2.3 without L486 & 487.pdf, q19.sql
>
>
> In our test environment, we found a serious performance degradation issue in 
> Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious 
> performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark 
> 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated 
> this problem and figured out the root cause is in community patch SPARK-21052 
> which add metrics to hash join process. And the impact code is 
> [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486]
>  and 
> [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487]
>   . Q19 costs about 30 seconds without these two lines code and 126 seconds 
> with these code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-14631) "drop database cascade" needs to unregister functions for HiveExternalCatalog

2016-12-12 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang closed SPARK-14631.
---
Resolution: Not A Problem

> "drop database cascade" needs to unregister functions for HiveExternalCatalog
> -
>
> Key: SPARK-14631
> URL: https://issues.apache.org/jira/browse/SPARK-14631
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Adrian Wang
>
> as HIVE-12304, drop database cascade of hive did not drop functions as well. 
> We need to fix this when call `dropDatabase` in HiveExternalCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17427) function SIZE should return -1 when parameter is null

2016-09-06 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-17427:
---

 Summary: function SIZE should return -1 when parameter is null
 Key: SPARK-17427
 URL: https://issues.apache.org/jira/browse/SPARK-17427
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang
Priority: Minor


`select size(null)` returns -1 in Hive. In order to be compatible, we need to 
return -1 also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4003) Add {Big Decimal, Timestamp, Date} types to Java SqlContext

2016-07-21 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388922#comment-15388922
 ] 

Adrian Wang commented on SPARK-4003:


DataTypes.TimestampType is not using java.sql.Timestamp internally. you should 
only use exposed API.

> Add {Big Decimal, Timestamp, Date} types to Java SqlContext
> ---
>
> Key: SPARK-4003
> URL: https://issues.apache.org/jira/browse/SPARK-4003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Adrian Wang
>Assignee: Adrian Wang
> Fix For: 1.2.0
>
>
> in JavaSqlContext, we need to let java program use big decimal, timestamp, 
> date types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16515) [SPARK][SQL] transformation script got failure for python script

2016-07-12 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374241#comment-15374241
 ] 

Adrian Wang commented on SPARK-16515:
-

The problem is spark did not find the right record writer from its conf when it 
has to write records to standard output. So when python read data from standard 
input, it crashes.

> [SPARK][SQL] transformation script got failure for python script
> 
>
> Key: SPARK-16515
> URL: https://issues.apache.org/jira/browse/SPARK-16515
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Yi Zhou
>Priority: Critical
>
> Run below SQL and get transformation script error for python script like 
> below error message.
> Query SQL:
> {code}
> CREATE VIEW q02_spark_sql_engine_validation_power_test_0_temp AS
> SELECT DISTINCT
>   sessionid,
>   wcs_item_sk
> FROM
> (
>   FROM
>   (
> SELECT
>   wcs_user_sk,
>   wcs_item_sk,
>   (wcs_click_date_sk * 24 * 60 * 60 + wcs_click_time_sk) AS tstamp_inSec
> FROM web_clickstreams
> WHERE wcs_item_sk IS NOT NULL
> AND   wcs_user_sk IS NOT NULL
> DISTRIBUTE BY wcs_user_sk
> SORT BY
>   wcs_user_sk,
>   tstamp_inSec -- "sessionize" reducer script requires the cluster by uid 
> and sort by tstamp
>   ) clicksAnWebPageType
>   REDUCE
> wcs_user_sk,
> tstamp_inSec,
> wcs_item_sk
>   USING 'python q2-sessionize.py 3600'
>   AS (
> wcs_item_sk BIGINT,
> sessionid STRING)
> ) q02_tmp_sessionize
> CLUSTER BY sessionid
> {code}
> Error Message:
> {code}
> 16/07/06 16:59:02 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 157.0 
> (TID 171, hw-node5): org.apache.spark.SparkException: Subprocess exited with 
> status 1. Error: Traceback (most recent call last):
>   File "q2-sessionize.py", line 49, in 
> user_sk, tstamp_str, item_sk  = line.strip().split("\t")
> ValueError: too many values to unpack
>   at 
> org.apache.spark.sql.hive.execution.ScriptTransformation$$anon$1.checkFailureAndPropagate(ScriptTransformation.scala:144)
>   at 
> org.apache.spark.sql.hive.execution.ScriptTransformation$$anon$1.hasNext(ScriptTransformation.scala:192)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Subprocess exited with status 1. 
> Error: Traceback (most recent call last):
>   File "q2-sessionize.py", line 49, in 
> user_sk, tstamp_str, item_sk  = line.strip().split("\t")
> ValueError: too many values to unpack
>   at 
> org.apache.spark.sql.hive.execution.ScriptTransformation$$anon$1.checkFailureAndPropagate(ScriptTransformation.scala:144)
>   at 
> org.apache.spark.sql.hive.execution.ScriptTransformation$$anon$1.hasNext(ScriptTransformation.scala:181)
>   ... 14 more
> 16/07/06 16:59:02 INFO scheduler.TaskSetManager: Lost task 7.0 in stage 157.0 
> (TID 173) on executor hw-node5: org.apache.spark.SparkException (Subprocess 
> exited with status 1. Error: Traceback (most recent call last):
>   File "q2-sessionize.py", line 49, in 
> user_sk, tstamp_str, item_sk  = line.strip().split("\t")
> ValueError: too many values to unpack
> ) [duplicate 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15397) 'locate' UDF got different result with boundary value case compared to Hive engine

2016-06-15 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang resolved SPARK-15397.
-
Resolution: Fixed

> 'locate' UDF got different result with boundary value case compared to Hive 
> engine
> --
>
> Key: SPARK-15397
> URL: https://issues.apache.org/jira/browse/SPARK-15397
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Yi Zhou
>Assignee: Adrian Wang
>
> Spark SQL:
> select locate("abc", "abc", 1);
> 0
> Hive:
> select locate("abc", "abc", 1);
> 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14126) [Table related commands] Truncate table

2016-04-18 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247161#comment-15247161
 ] 

Adrian Wang commented on SPARK-14126:
-

Yes, still working.

> [Table related commands] Truncate table
> ---
>
> Key: SPARK-14126
> URL: https://issues.apache.org/jira/browse/SPARK-14126
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_TRUNCATETABLE
> We also need to check the behavior of Hive when we call truncate table on a 
> partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14631) "drop database cascade" needs to unregister functions for HiveExternalCatalog

2016-04-14 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-14631:
---

 Summary: "drop database cascade" needs to unregister functions for 
HiveExternalCatalog
 Key: SPARK-14631
 URL: https://issues.apache.org/jira/browse/SPARK-14631
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang


as HIVE-12304, drop database cascade of hive did not drop functions as well. We 
need to fix this when call `dropDatabase` in HiveExternalCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14126) [Table related commands] Truncate table

2016-04-11 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236353#comment-15236353
 ] 

Adrian Wang commented on SPARK-14126:
-

I'm working on this.

> [Table related commands] Truncate table
> ---
>
> Key: SPARK-14126
> URL: https://issues.apache.org/jira/browse/SPARK-14126
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_TRUNCATETABLE
> We also need to check the behavior of Hive when we call truncate table on a 
> partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv

2016-03-19 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-14021:

Description: This is to create a custom context for command bin/spark-sql 
and sbin/start-thriftserver. Any context that is derived from HiveContext is 
acceptable. User need to configure the class name of custom context in a config 
of spark.sql.context.class, and make sure the class in classpath. This is to 
provide a more elegant way for custom configurations and changes for 
infrastructure team.

> Support custom context derived from HiveContext for SparkSQLEnv
> ---
>
> Key: SPARK-14021
> URL: https://issues.apache.org/jira/browse/SPARK-14021
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Adrian Wang
>
> This is to create a custom context for command bin/spark-sql and 
> sbin/start-thriftserver. Any context that is derived from HiveContext is 
> acceptable. User need to configure the class name of custom context in a 
> config of spark.sql.context.class, and make sure the class in classpath. This 
> is to provide a more elegant way for custom configurations and changes for 
> infrastructure team.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14021) Support custom context derived from HiveContext for SparkSQLEnv

2016-03-19 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-14021:
---

 Summary: Support custom context derived from HiveContext for 
SparkSQLEnv
 Key: SPARK-14021
 URL: https://issues.apache.org/jira/browse/SPARK-14021
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Adrian Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13819) using a tegexp_replace in a gropu by clause raises a nullpointerexception

2016-03-15 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194902#comment-15194902
 ] 

Adrian Wang commented on SPARK-13819:
-

I'll take a look at this.

> using a tegexp_replace in a gropu by clause raises a nullpointerexception
> -
>
> Key: SPARK-13819
> URL: https://issues.apache.org/jira/browse/SPARK-13819
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Javier Pérez
>
> 1. Start start-thriftserver.sh
> 2. connect with beeline
> 3. Perform the following query over a table:
>   SELECT t0.textsample 
>   FROM test t0 
>   ORDER BY regexp_replace(
> t0.code, 
> concat('\\Q', 'a', '\\E'), 
> regexp_replace(
>regexp_replace('zz', '', ''),
> '\\$', 
> '\\$')) DESC;
> Problem: NullPointerException
> Trace:
>  java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.RegExpReplace.nullSafeEval(regexpExpressions.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:458)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:36)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:27)
>   at scala.math.Ordering$class.gt(Ordering.scala:97)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.gt(ordering.scala:27)
>   at org.apache.spark.RangePartitioner.getPartition(Partitioner.scala:168)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13837) SQL Context function to_date() returns wrong date

2016-03-15 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194892#comment-15194892
 ] 

Adrian Wang commented on SPARK-13837:
-

Which timezone are your system in?

> SQL Context function to_date() returns wrong date
> -
>
> Key: SPARK-13837
> URL: https://issues.apache.org/jira/browse/SPARK-13837
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Python version:
> 2.7.6 (default, Mar 22 2014, 22:59:56) 
> [GCC 4.8.2]
>Reporter: Arnaud Caruso
>
> When using the SQL Context function to_date on a timestamp, it sometimes 
> returns the wrong date.
> Here's how to reproduce the bug in Python:
> data = [[datetime.datetime(2015, 2, 20, 0, 0, 2)],[datetime.datetime(2015, 
> 10, 9, 0, 0, 2)]]
> rddData = sc.parallelize(data)
> fields=[StructField('timestamp', TimestampType(), True)]
> schema=StructType(fields)
> data_table=sqlCtx.createDataFrame(data,schema)
> sqlCtx.registerDataFrameAsTable(data_table,"data")
> query="SELECT timestamp, TO_DATE(timestamp) FROM data "
> df=sqlCtx.sql(query)
> df.collect()
> Here are the results I get:
> [Row(timestamp=datetime.datetime(2015, 2, 20, 0, 0, 2), 
> _c1=datetime.date(2015, 2, 20)),
>  Row(timestamp=datetime.datetime(2015, 10, 9, 0, 0, 2), 
> _c1=datetime.date(2015, 10, 8))]
> The first date is right but the second date is wrong, it returns October 8th 
> instead of returning October 9th.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186674#comment-15186674
 ] 

Adrian Wang commented on SPARK-13393:
-

That's the case we should throw exceptions.

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186673#comment-15186673
 ] 

Adrian Wang commented on SPARK-13393:
-

See my updated comment. That's not reasonable.

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186660#comment-15186660
 ] 

Adrian Wang edited comment on SPARK-13393 at 3/9/16 7:31 AM:
-

How do you resolve it? Both sides are `df`, so we can resolve df("key") to 
single side, which leads to a Cartesian join (4 output rows); or we can resolve 
to both sides (2 output rows). We are not able to tell what the user meant to.
The current design would not throw any exception because we assume same cols in 
condition are from different sides, as I have declared. I don't think that's a 
decent way.


was (Author: adrian-wang):
How do you resolve it? Both sides are `df`, so we can resolve df("key") to 
single side, which leads to a Cartesian join (4 output rows); or we can resolve 
to both sides (2 output rows). We are not able to tell what the user meant to.

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186661#comment-15186661
 ] 

Adrian Wang commented on SPARK-13393:
-

How do you resolve it? Both sides are `df`, so we can resolve df("key") to 
single side, which leads to a Cartesian join (4 output rows); or we can resolve 
to both sides (2 output rows). We are not able to tell what the user meant to.

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186660#comment-15186660
 ] 

Adrian Wang commented on SPARK-13393:
-

How do you resolve it? Both sides are `df`, so we can resolve df("key") to 
single side, which leads to a Cartesian join (4 output rows); or we can resolve 
to both sides (2 output rows). We are not able to tell what the user meant to.

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-13393:

Comment: was deleted

(was: How do you resolve it? Both sides are `df`, so we can resolve df("key") 
to single side, which leads to a Cartesian join (4 output rows); or we can 
resolve to both sides (2 output rows). We are not able to tell what the user 
meant to.)

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186652#comment-15186652
 ] 

Adrian Wang commented on SPARK-13393:
-

In your example, df1("name") and df2("name") is exactly the same to each other, 
it's easy to throw an exception explicitly to tell user not to join 2 same 
dataframes without alias. We can do the same to this issue too.

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186642#comment-15186642
 ] 

Adrian Wang commented on SPARK-13393:
-

This is another issue; here we are talking about `varadha` and `df`, which is 
obviously different dataframes. For exactly the same dataframe, I think 
aliasing is still necessary.

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186605#comment-15186605
 ] 

Adrian Wang commented on SPARK-13393:
-

So that's the reason I have to introduce the layer of `JoinedData` to keep left 
and right dataframe instance, then we can trace what the user wants to project 
with the specific dataframe info in Column instance (if exists).

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-08 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186580#comment-15186580
 ] 

Adrian Wang commented on SPARK-13393:
-

Hi [~srinathsmn]

In this `errorDF`, both `df('id')` and `varadha('id')` has the same 
`exprId`(they all come from `df`), so we cannot disambiguate between them in 
this design now.

As a workaround,  you should write code like `correctDF`, assign an alias for 
the columns first, or you can register df as table and then use a complete SQL 
query to get your data.

I think this is a bug under current design. I think we should put down the 
dataframe information in `Column` instances, and use an interval representation 
of `JoindeData` as the return value of `def join()`, in order to resolve 
ambiguity caused by self-join. For now, even I write something like

val errorDF = df.join(varadha, df("id") === df("id"), 
"left_outer").select(df("id"), varadha("id") as "varadha_id")

The result would still be the same, since we are assuming condition with 
ambiguity should always be resolved to both sides.

I can draft a design doc for this if you are interested.
cc [~smilegator][~rxin][~marmbrus]

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13446) Spark need to support reading data from Hive 2.0.0 metastore

2016-03-02 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177356#comment-15177356
 ] 

Adrian Wang commented on SPARK-13446:
-

That's not enough. We still need some code change.

> Spark need to support reading data from Hive 2.0.0 metastore
> 
>
> Key: SPARK-13446
> URL: https://issues.apache.org/jira/browse/SPARK-13446
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Lifeng Wang
>
> Spark provided HIveContext class to read data from hive metastore directly. 
> While it only supports hive 1.2.1 version and older. Since hive 2.0.0 has 
> released, it's better to upgrade to support Hive 2.0.0.
> {noformat}
> 16/02/23 02:35:02 INFO metastore: Trying to connect to metastore with URI 
> thrift://hsw-node13:9083
> 16/02/23 02:35:02 INFO metastore: Opened a connection to metastore, current 
> connections: 1
> 16/02/23 02:35:02 INFO metastore: Connected to metastore.
> Exception in thread "main" java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
> at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:192)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
> at 
> org.apache.spark.sql.hive.HiveContext$$anon$1.(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:421)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:72)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:739)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:735)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-01 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175034#comment-15175034
 ] 

Adrian Wang commented on SPARK-13393:
-

[~srinathsmn] I have identified the issue, and working on this.

> Column mismatch issue in left_outer join using Spark DataFrame
> --
>
> Key: SPARK-13393
> URL: https://issues.apache.org/jira/browse/SPARK-13393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Varadharajan
>
> Consider the below snippet:
> {code:title=test.scala|borderStyle=solid}
> case class Person(id: Int, name: String)
> val df = sc.parallelize(List(
>   Person(1, "varadha"),
>   Person(2, "nagaraj")
> )).toDF
> val varadha = df.filter("id = 1")
> val errorDF = df.join(varadha, df("id") === varadha("id"), 
> "left_outer").select(df("id"), varadha("id") as "varadha_id")
> val nagaraj = df.filter("id = 2").select(df("id") as "n_id")
> val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), 
> "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id")
> {code}
> The `errorDF` dataframe, after the left join is messed up and shows as below:
> | id|varadha_id|
> |  1| 1|
> |  2| 2 (*This should've been null*)| 
> whereas correctDF has the correct output after the left join:
> | id|nagaraj_id|
> |  1|  null|
> |  2| 2|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13446) Spark need to support reading data from HIve 2.0.0 metastore

2016-02-23 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158516#comment-15158516
 ] 

Adrian Wang commented on SPARK-13446:
-

Hive 2.0 use HIVE_ZOOKEEPER_SESSION_TIMEOUT instead of HIVE_STATS_JDBC_TIMEOUT. 
see: HIVE-12164
We will look into this.

> Spark need to support reading data from HIve 2.0.0 metastore
> 
>
> Key: SPARK-13446
> URL: https://issues.apache.org/jira/browse/SPARK-13446
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Lifeng Wang
>
> Spark provided HIveContext class to read data from hive metastore directly. 
> While it only supports hive 1.2.1 version and older. Since hive 2.0.0 has 
> released, it's better to upgrade to support Hive 2.0.0.
> {noformat}
> 16/02/23 02:35:02 INFO metastore: Trying to connect to metastore with URI 
> thrift://hsw-node13:9083
> 16/02/23 02:35:02 INFO metastore: Opened a connection to metastore, current 
> connections: 1
> 16/02/23 02:35:02 INFO metastore: Connected to metastore.
> Exception in thread "main" java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
> at 
> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:473)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:192)
> at 
> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
> at 
> org.apache.spark.sql.hive.HiveContext$$anon$1.(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:422)
> at 
> org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:421)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:72)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:739)
> at org.apache.spark.sql.SQLContext.table(SQLContext.scala:735)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12930) NullPointerException running hive query with array dereference in select and where clause

2016-02-22 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156600#comment-15156600
 ] 

Adrian Wang commented on SPARK-12930:
-

Could you try SPARK-13056?

> NullPointerException running hive query with array dereference in select and 
> where clause
> -
>
> Key: SPARK-12930
> URL: https://issues.apache.org/jira/browse/SPARK-12930
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Thomas Graves
>
> I had a user doing a hive query from spark where they had a array dereference 
> in the select clause and in the where clause, it gave the user a 
> NullPointerException when the where clause should have filtered it out.  Its 
> like spark is evaluating the select part before running the where clause.  
> The info['pos'] below is what caused the issue:
> Query looked like:
> SELECT foo, 
> info['pos'] AS pos 
> FROM db.table
> WHERE date >= '$initialDate' AND
> date <= '$finalDate' AND
> info is not null AND
> info['pos'] is not null
> LIMIT 10 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13301) PySpark Dataframe return wrong results with custom UDF

2016-02-18 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151913#comment-15151913
 ] 

Adrian Wang commented on SPARK-13301:
-

Hi Simone, I tried your code using master branch and the result is OK.

> PySpark Dataframe return wrong results with custom UDF
> --
>
> Key: SPARK-13301
> URL: https://issues.apache.org/jira/browse/SPARK-13301
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: PySpark in yarn-client mode - CDH 5.5.1
>Reporter: Simone
>Priority: Critical
>
> Using a User Defined Function in PySpark inside the withColumn() method of 
> Dataframe, gives wrong results.
> Here an example:
> from pyspark.sql import functions
> import string
> myFunc = functions.udf(lambda s: string.lower(s))
> myDF.select("col1", "col2").withColumn("col3", myFunc(myDF["col1"])).show()
> |col1|   col2|col3|
> |1265AB4F65C05740E...|Ivo|4f00ae514e7c015be...|
> |1D94AB4F75C83B51E...|   Raffaele|4f00dcf6422100c0e...|
> |4F008903600A0133E...|   Cristina|4f008903600a0133e...|
> The results are wrong and seem to be random: some record are OK (for example 
> the third) some others NO (for example the first 2).
> The problem seems not occur with Spark built-in functions:
> from pyspark.sql.functions import *
> myDF.select("col1", "col2").withColumn("col3", lower(myDF["col1"])).show()
> Without the withColumn() method, results seems to be always correct:
> myDF.select("col1", "col2", myFunc(myDF["col1"])).show()
> This can be considered only in part a workaround because you have to list 
> each time all column of your Dataframe.
> Also in Scala/Java the problems seems not occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13283) Spark doesn't escape column names when creating table on JDBC

2016-02-15 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148196#comment-15148196
 ] 

Adrian Wang commented on SPARK-13283:
-

So the problem here is that "from" is a reserved word in MySQL, but we failed 
to keep the backtick around it, do we?

> Spark doesn't escape column names when creating table on JDBC
> -
>
> Key: SPARK-13283
> URL: https://issues.apache.org/jira/browse/SPARK-13283
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Maciej Bryński
>
> Hi,
> I have following problem.
> I have DF where one of the columns has 'from' name.
> {code}
> root
>  |-- from: decimal(20,0) (nullable = true)
> {code}
> When I'm saving it to MySQL database I'm getting error:
> {code}
> Py4JJavaError: An error occurred while calling o183.jdbc.
> : com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an 
> error in your SQL syntax; check the manual that corresponds to your MySQL 
> server version for the right syntax to use near 'from DECIMAL(20,0) , ' at 
> line 1
> {code}
> I think the problem is that Spark doesn't escape column names with ` sign on 
> creating table.
> {code}
> `from`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13283) Spark doesn't escape column names when creating table on JDBC

2016-02-15 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148171#comment-15148171
 ] 

Adrian Wang commented on SPARK-13283:
-

See comments from SPARK-13297, this have been fixed in master branch.

> Spark doesn't escape column names when creating table on JDBC
> -
>
> Key: SPARK-13283
> URL: https://issues.apache.org/jira/browse/SPARK-13283
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Maciej Bryński
>
> Hi,
> I have following problem.
> I have DF where one of the columns has 'from' name.
> {code}
> root
>  |-- from: decimal(20,0) (nullable = true)
> {code}
> When I'm saving it to MySQL database I'm getting error:
> {code}
> Py4JJavaError: An error occurred while calling o183.jdbc.
> : com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an 
> error in your SQL syntax; check the manual that corresponds to your MySQL 
> server version for the right syntax to use near 'from DECIMAL(20,0) , ' at 
> line 1
> {code}
> I think the problem is that Spark doesn't escape column names with ` sign on 
> creating table.
> {code}
> `from`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12985) Spark Hive thrift server big decimal data issue

2016-02-03 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130013#comment-15130013
 ] 

Adrian Wang commented on SPARK-12985:
-

I think this is a problem of Simba. JDBC never require a `Decimal` to be a 
`HiveDecimal`

> Spark Hive thrift server big decimal data issue
> ---
>
> Key: SPARK-12985
> URL: https://issues.apache.org/jira/browse/SPARK-12985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Alex Liu
>Priority: Minor
>
> I tested the trial version JDBC driver from Simba, it works for simple query. 
> But there is some issue with data mapping. e.g.
> {code}
> java.sql.SQLException: [Simba][SparkJDBCDriver](500312) Error in fetching 
> data rows: java.math.BigDecimal cannot be cast to 
> org.apache.hadoop.hive.common.type.HiveDecimal;
>   at 
> com.simba.spark.hivecommon.api.HS2Client.buildExceptionFromTStatus(Unknown 
> Source)
>   at com.simba.spark.hivecommon.api.HS2Client.fetchNRows(Unknown Source)
>   at com.simba.spark.hivecommon.api.HS2Client.fetchRows(Unknown Source)
>   at com.simba.spark.hivecommon.dataengine.BackgroundFetcher.run(Unknown 
> Source)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> Caused by: com.simba.spark.support.exceptions.GeneralException: 
> [Simba][SparkJDBCDriver](500312) Error in fetching data rows: 
> java.math.BigDecimal cannot be cast to 
> org.apache.hadoop.hive.common.type.HiveDecimal;
>   ... 8 more
> {code}
> To fix it
> {code}
>case DecimalType() =>
>  -to += from.getDecimal(ordinal)
>  +to += HiveDecimal.create(from.getDecimal(ordinal))
> {code}
> to 
> https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L87



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13056) Map column would throw NPE if value is null

2016-01-27 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-13056:
---

 Summary: Map column would throw NPE if value is null
 Key: SPARK-13056
 URL: https://issues.apache.org/jira/browse/SPARK-13056
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang


Create a map like
{ "a": "somestring",
  "b": null}
Query like

SELECT col["b"] FROM t1;

NPE would be thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12828) support natural join

2016-01-14 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-12828:
---

 Summary: support natural join
 Key: SPARK-12828
 URL: https://issues.apache.org/jira/browse/SPARK-12828
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Adrian Wang


support queries like:
select * from t1 natural join t2;
select * from t1 natural left join t2;
select * from t1 natural right join t2;
select * from t1 natural full outer join t2;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11983) remove all unused codegen fallback traits

2015-11-25 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-11983:
---

 Summary: remove all unused codegen fallback traits
 Key: SPARK-11983
 URL: https://issues.apache.org/jira/browse/SPARK-11983
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Adrian Wang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11983) remove all unused codegen fallback traits

2015-11-25 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-11983:

Description: We use trait `CodegenFallback` to generate default code gen 
code, and if we have implemented genCode, then no need to derive from this 
trait.

> remove all unused codegen fallback traits
> -
>
> Key: SPARK-11983
> URL: https://issues.apache.org/jira/browse/SPARK-11983
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Adrian Wang
>Priority: Minor
>
> We use trait `CodegenFallback` to generate default code gen code, and if we 
> have implemented genCode, then no need to derive from this trait.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11972) [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter spark-sql session

2015-11-24 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026093#comment-15026093
 ] 

Adrian Wang commented on SPARK-11972:
-

SPARK-11624 would resolve this, too.
That's because we created a new SessionState that haven't take commandline 
options into account.

> [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter 
> spark-sql session
> ---
>
> Key: SPARK-11972
> URL: https://issues.apache.org/jira/browse/SPARK-11972
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Yi Zhou
>Priority: Critical
>
> Reproduce Steps:
> /usr/lib/spark/bin/spark-sql -v --driver-memory 4g --executor-memory 7g 
> --executor-cores 5 --num-executors 31 --master yarn-client --conf 
> spark.yarn.executor.memoryOverhead=1024 --hiveconf RESULT_TABLE=test_result01
> {code}
> >use test;
> >DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE};
> 15/11/24 13:45:12 INFO parse.ParseDriver: Parsing command: DROP TABLE IF 
> EXISTS ${hiveconf:RESULT_TABLE}
> NoViableAltException(16@[192:1: tableName : (db= identifier DOT tab= 
> identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME 
> $tab) );])
> at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
> at org.antlr.runtime.DFA.predict(DFA.java:144)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4747)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:45918)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.dropTableStatement(HiveParser.java:7133)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2655)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.spark.sql.hive.HiveQl$.getAst(HiveQl.scala:276)
> at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:303)
> at 
> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
> at 
> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
> at 
> scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
> at 
> scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
> at 
> scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
> at 
> scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
> at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
> at 
> scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
> at 
> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
> at 
> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at 
> scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
> at 
> scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
> at 
> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
> at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:295)
> at 
> org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:65)
> at 
> org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:65)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:279)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:226)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:225)
> at 
>

[jira] [Created] (SPARK-11916) Expression TRIM/LTRIM/RTRIM to support specific trim word

2015-11-22 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-11916:
---

 Summary: Expression TRIM/LTRIM/RTRIM to support specific trim word
 Key: SPARK-11916
 URL: https://issues.apache.org/jira/browse/SPARK-11916
 Project: Spark
  Issue Type: Improvement
Reporter: Adrian Wang
Priority: Minor


supports expressions like `trim('xxxabcxxx', 'x')`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11624) Spark SQL CLI will set sessionstate twice

2015-11-09 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-11624:
---

 Summary: Spark SQL CLI will set sessionstate twice
 Key: SPARK-11624
 URL: https://issues.apache.org/jira/browse/SPARK-11624
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11624) Spark SQL CLI will set sessionstate twice

2015-11-09 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-11624:

Description: 
spark-sql> !echo "test";
Exception in thread "main" java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.session.SessionState cannot be cast to 
org.apache.hadoop.hive.cli.CliSessionState
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

> Spark SQL CLI will set sessionstate twice
> -
>
> Key: SPARK-11624
> URL: https://issues.apache.org/jira/browse/SPARK-11624
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Adrian Wang
>
> spark-sql> !echo "test";
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.session.SessionState cannot be cast to 
> org.apache.hadoop.hive.cli.CliSessionState
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11591) flush spark-sql command line history to history file

2015-11-09 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-11591:
---

 Summary: flush spark-sql command line history to history file
 Key: SPARK-11591
 URL: https://issues.apache.org/jira/browse/SPARK-11591
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Adrian Wang


currently, spark-sql would not flush command history when exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11592) flush spark-sql command line history to history file

2015-11-09 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-11592:
---

 Summary: flush spark-sql command line history to history file
 Key: SPARK-11592
 URL: https://issues.apache.org/jira/browse/SPARK-11592
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Adrian Wang


currently, spark-sql would not flush command history when exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-11591) flush spark-sql command line history to history file

2015-11-09 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang closed SPARK-11591.
---
Resolution: Fixed

> flush spark-sql command line history to history file
> 
>
> Key: SPARK-11591
> URL: https://issues.apache.org/jira/browse/SPARK-11591
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Adrian Wang
>
> currently, spark-sql would not flush command history when exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11396) datetime function: to_unix_timestamp

2015-10-28 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-11396:
---

 Summary: datetime function: to_unix_timestamp
 Key: SPARK-11396
 URL: https://issues.apache.org/jira/browse/SPARK-11396
 Project: Spark
  Issue Type: Sub-task
Reporter: Adrian Wang


`to_unix_timestamp` is the deterministic version of `unix_timestamp`, as it 
accepts at least one parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11312) Cannot drop temporary function

2015-10-26 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-11312:
---

 Summary: Cannot drop temporary function
 Key: SPARK-11312
 URL: https://issues.apache.org/jira/browse/SPARK-11312
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang


create temporary function is done by executionHive, while DROP TEMPORARY 
FUNCTION is done by metadataHive



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11312) Cannot drop temporary function

2015-10-26 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang resolved SPARK-11312.
-
Resolution: Duplicate

> Cannot drop temporary function
> --
>
> Key: SPARK-11312
> URL: https://issues.apache.org/jira/browse/SPARK-11312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Adrian Wang
>
> create temporary function is done by executionHive, while DROP TEMPORARY 
> FUNCTION is done by metadataHive



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10507) reject temporal expressions such as timestamp - timestamp at parse time

2015-10-12 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952747#comment-14952747
 ] 

Adrian Wang commented on SPARK-10507:
-

It seems this bug has been fixed long long ago. I just checked 1.5.0 and 
there's no such problem.

> reject temporal expressions such as timestamp - timestamp at parse time 
> 
>
> Key: SPARK-10507
> URL: https://issues.apache.org/jira/browse/SPARK-10507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: N Campbell
>Priority: Minor
>
> TIMESTAMP - TIMESTAMP in ISO-SQL should return an interval type which SPARK 
> does not support.. 
> A similar expression in Hive 0.13 fails with Error: Could not create 
> ResultSet: Required field 'type' is unset! 
> Struct:TPrimitiveTypeEntry(type:null) and SPARK has similar "challenges". 
> While Hive 1.2.1 has added some interval type support it is far from complete 
> with respect to ISO-SQL. 
> The ability to compute the period of time (years, days, weeks, hours, ...) 
> between timestamps or add/substract intervals from a timestamp are extremely 
> common in business applications. 
> Currently, a value expression such as select timestampcol - timestampcol from 
> t will fail during execution and not parse time. While the error thrown 
> states that fact, it would better for those value expressions to be rejected 
> at parse time along with indicating the expression that is causing the parser 
> error.
> Operation: execute
> Errors:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 6214.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
> 6214.0 (TID 21208, sandbox.hortonworks.com): java.lang.RuntimeException: Type 
> TimestampType does not support numeric operations
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.catalyst.expressions.Subtract.numeric$lzycompute(arithmetic.scala:138)
>   at 
> org.apache.spark.sql.catalyst.expressions.Subtract.numeric(arithmetic.scala:136)
>   at 
> org.apache.spark.sql.catalyst.expressions.Subtract.eval(arithmetic.scala:150)
>   at 
> org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:113)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:813)
>   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:813)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
> {code}
> {code}
> create table  if not exists TTS ( RNUM int , CTS timestamp )TERMINATED BY 
> '\n' 
>  STORED AS orc  ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10463) remove PromotePrecision during optimization

2015-09-06 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-10463:
---

 Summary: remove PromotePrecision during optimization
 Key: SPARK-10463
 URL: https://issues.apache.org/jira/browse/SPARK-10463
 Project: Spark
  Issue Type: Improvement
Reporter: Adrian Wang
Priority: Trivial


This node is not necessary after HiveTypeCoercion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8360) Streaming DataFrames

2015-08-25 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712377#comment-14712377
 ] 

Adrian Wang commented on SPARK-8360:


https://github.com/intel-bigdata/spark-streamingsql
Our streaming sql project is highly related to this jira ticket.

 Streaming DataFrames
 

 Key: SPARK-8360
 URL: https://issues.apache.org/jira/browse/SPARK-8360
 Project: Spark
  Issue Type: Umbrella
  Components: SQL, Streaming
Reporter: Reynold Xin

 Umbrella ticket to track what's needed to make streaming DataFrame a reality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10130) type coercion for IF should have children resolved first

2015-08-21 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-10130:

Priority: Blocker  (was: Major)

 type coercion for IF should have children resolved first
 

 Key: SPARK-10130
 URL: https://issues.apache.org/jira/browse/SPARK-10130
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang
Priority: Blocker

 SELECT IF(a  0, a, 0) FROM (SELECT key a FROM src) temp;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10130) type coercion for IF should have children resolved first

2015-08-21 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-10130:

Fix Version/s: (was: 1.5.0)

 type coercion for IF should have children resolved first
 

 Key: SPARK-10130
 URL: https://issues.apache.org/jira/browse/SPARK-10130
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang

 SELECT IF(a  0, a, 0) FROM (SELECT key a FROM src) temp;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10130) type coercion for IF should have children resolved first

2015-08-21 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-10130:

Target Version/s: 1.5.0

 type coercion for IF should have children resolved first
 

 Key: SPARK-10130
 URL: https://issues.apache.org/jira/browse/SPARK-10130
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang
Priority: Blocker

 SELECT IF(a  0, a, 0) FROM (SELECT key a FROM src) temp;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10130) type coercion for IF should have children resolved first

2015-08-20 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-10130:
---

 Summary: type coercion for IF should have children resolved first
 Key: SPARK-10130
 URL: https://issues.apache.org/jira/browse/SPARK-10130
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang


SELECT IF(a  0, a, 0) FROM (SELECT key a FROM src) temp;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10130) type coercion for IF should have children resolved first

2015-08-20 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-10130:

Fix Version/s: 1.5.0

 type coercion for IF should have children resolved first
 

 Key: SPARK-10130
 URL: https://issues.apache.org/jira/browse/SPARK-10130
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang
 Fix For: 1.5.0


 SELECT IF(a  0, a, 0) FROM (SELECT key a FROM src) temp;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10083) CaseWhen should support type coercion of DecimalType and FractionalType

2015-08-18 Thread Adrian Wang (JIRA)

Adrian Wang created SPARK-10083:
---

 Summary: CaseWhen should support type coercion of DecimalType and 
FractionalType
 Key: SPARK-10083
 URL: https://issues.apache.org/jira/browse/SPARK-10083
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang


create t1 (a decimal(7, 2), b long);
select case when 1=1 then a else 1.0 end from t1;
select case when 1=1 then a else b end from t1;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9374) [Spark SQL] Throw out erorr of AnalysisException: nondeterministic expressions are only allowed in Project or Filter during the spark sql parse phase

2015-07-27 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643761#comment-14643761
 ] 

Adrian Wang commented on SPARK-9374:


[~chenghao][~cloud_fan][~jameszhouyi]
UnixTimestamp is a non-deterministic expression, because when we pass zero 
argument to this function, it means the same with current_timestamp.
And there is a determistic version of this function in hive, namely 
to_unix_timstamp. We could use that temporarily. After SPARK-8174 resolved, we 
would be able to tell whether the use of unix_timestamp is deterministic or not.

 [Spark SQL] Throw out erorr of AnalysisException: nondeterministic 
 expressions are only allowed in Project or Filter during the spark sql parse 
 phase
 ---

 Key: SPARK-9374
 URL: https://issues.apache.org/jira/browse/SPARK-9374
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Yi Zhou
Priority: Blocker

 #Spark SQL Query
 INSERT INTO TABLE TEST_QUERY_0_result
 SELECT w_state, i_item_id,
   SUM(
 CASE WHEN (unix_timestamp(d_date,'-MM-dd')  
 unix_timestamp('2001-03-16','-MM-dd'))
 THEN ws_sales_price - COALESCE(wr_refunded_cash,0)
 ELSE 0.0 END
   ) AS sales_before,
   SUM(
 CASE WHEN (unix_timestamp(d_date,'-MM-dd') = 
 unix_timestamp('2001-03-16','-MM-dd'))
 THEN ws_sales_price - coalesce(wr_refunded_cash,0)
 ELSE 0.0 END
   ) AS sales_after
 FROM (
   SELECT *
   FROM web_sales ws
   LEFT OUTER JOIN web_returns wr ON (ws.ws_order_number = wr.wr_order_number
   AND ws.ws_item_sk = wr.wr_item_sk)
 ) a1
 JOIN item i ON a1.ws_item_sk = i.i_item_sk
 JOIN warehouse w ON a1.ws_warehouse_sk = w.w_warehouse_sk
 JOIN date_dim d ON a1.ws_sold_date_sk = d.d_date_sk
 AND unix_timestamp(d.d_date, '-MM-dd') = unix_timestamp('2001-03-16', 
 '-MM-dd') - 30*24*60*60 --subtract 30 days in seconds
 AND unix_timestamp(d.d_date, '-MM-dd') = unix_timestamp('2001-03-16', 
 '-MM-dd') + 30*24*60*60 --add 30 days in seconds
 GROUP BY w_state,i_item_id
 CLUSTER BY w_state,i_item_id
 Error Message##
 org.apache.spark.sql.AnalysisException: nondeterministic expressions are only 
 allowed in Project or Filter, found:
  (((ws_sold_date_sk = d_date_sk)  
 (HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnixTimeStamp(d_date,-MM-dd)
  = 
 (HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnixTimeStamp(2001-03-16,-MM-dd)
  - CAST30 * 24) * 60) * 60), LongType  
 (HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnixTimeStamp(d_date,-MM-dd)
  = 
 (HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnixTimeStamp(2001-03-16,-MM-dd)
  + CAST30 * 24) * 60) * 60), LongType
 in operator Join Inner, Somews_sold_date_sk#289L = d_date_sk#383L)  
 (HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnixTimeStamp(d_date#385,-MM-dd)
  = 
 (HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnixTimeStamp(2001-03-16,-MM-dd)
  - CAST30 * 24) * 60) * 60), LongType  
 (HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnixTimeStamp(d_date#385,-MM-dd)
  = 
 (HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnixTimeStamp(2001-03-16,-MM-dd)
  + CAST30 * 24) * 60) * 60), LongType)
  ;
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
   at 
 org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:43)
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:148)
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:103)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102)
   at

[jira] [Commented] (SPARK-9196) DatetimeExpressionsSuite: function current_timestamp is flaky

2015-07-20 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633733#comment-14633733
 ] 

Adrian Wang commented on SPARK-9196:


Thanks, I'll fix it asap

 DatetimeExpressionsSuite: function current_timestamp is flaky
 -

 Key: SPARK-9196
 URL: https://issues.apache.org/jira/browse/SPARK-9196
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu
Assignee: Adrian Wang
Priority: Critical

 {code}
 - function current_timestamp *** FAILED *** (77 milliseconds)
 [info]   Results do not match for query:
 [info]   == Parsed Logical Plan ==
 [info]   'Project [unresolvedalias(('CURRENT_TIMESTAMP() = 
 'CURRENT_TIMESTAMP()))]
 [info]OneRowRelation$
 [info]   
 [info]   == Analyzed Logical Plan ==
 [info]   _c0: boolean
 [info]   Project [(currenttimestamp() = currenttimestamp()) AS _c0#11436]
 [info]OneRowRelation$
 [info]   
 [info]   == Optimized Logical Plan ==
 [info]   Project [false AS _c0#11436]
 [info]OneRowRelation$
 [info]   
 [info]   == Physical Plan ==
 [info]   Project [false AS _c0#11436]
 [info]PhysicalRDD ParallelCollectionRDD[650] at apply at 
 Transformer.scala:22
 [info]   
 [info]   Code Generation: true
 [info]   == RDD ==
 [info]   == Results ==
 [info]   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
 [info]   ![true] [false] (QueryTest.scala:61)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
 org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
 [info]   at 
 org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
 [info]   at org.scalatest.Assertions$class.fail(Assertions.scala:1328)
 [info]   at org.scalatest.FunSuite.fail(FunSuite.scala:1555)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:61)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:67)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply$mcV$sp(DatetimeExpressionsSuite.scala:42)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply(DatetimeExpressionsSuite.scala:39)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply(DatetimeExpressionsSuite.scala:39)
 [info]   at 
 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
 [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
 [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
 [info]   at 
 org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
 [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
 [info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
 [info]   at scala.collection.immutable.List.foreach(List.scala:318)
 [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
 [info]   at 
 org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
 [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
 [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
 [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
 [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
 [info]   at 
 org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
 [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
 [info]   at org.scalatest.FunSuite.run(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
 [info]   at

[jira] [Commented] (SPARK-9196) DatetimeExpressionsSuite: function current_timestamp is flaky

2015-07-20 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633740#comment-14633740
 ] 

Adrian Wang commented on SPARK-9196:


if this is very often, we can ignore this test for now, I'll fix it tomorrow.

 DatetimeExpressionsSuite: function current_timestamp is flaky
 -

 Key: SPARK-9196
 URL: https://issues.apache.org/jira/browse/SPARK-9196
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu
Assignee: Adrian Wang
Priority: Critical

 {code}
 - function current_timestamp *** FAILED *** (77 milliseconds)
 [info]   Results do not match for query:
 [info]   == Parsed Logical Plan ==
 [info]   'Project [unresolvedalias(('CURRENT_TIMESTAMP() = 
 'CURRENT_TIMESTAMP()))]
 [info]OneRowRelation$
 [info]   
 [info]   == Analyzed Logical Plan ==
 [info]   _c0: boolean
 [info]   Project [(currenttimestamp() = currenttimestamp()) AS _c0#11436]
 [info]OneRowRelation$
 [info]   
 [info]   == Optimized Logical Plan ==
 [info]   Project [false AS _c0#11436]
 [info]OneRowRelation$
 [info]   
 [info]   == Physical Plan ==
 [info]   Project [false AS _c0#11436]
 [info]PhysicalRDD ParallelCollectionRDD[650] at apply at 
 Transformer.scala:22
 [info]   
 [info]   Code Generation: true
 [info]   == RDD ==
 [info]   == Results ==
 [info]   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
 [info]   ![true] [false] (QueryTest.scala:61)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
 org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
 [info]   at 
 org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
 [info]   at org.scalatest.Assertions$class.fail(Assertions.scala:1328)
 [info]   at org.scalatest.FunSuite.fail(FunSuite.scala:1555)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:61)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:67)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply$mcV$sp(DatetimeExpressionsSuite.scala:42)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply(DatetimeExpressionsSuite.scala:39)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply(DatetimeExpressionsSuite.scala:39)
 [info]   at 
 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
 [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
 [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
 [info]   at 
 org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
 [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
 [info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
 [info]   at scala.collection.immutable.List.foreach(List.scala:318)
 [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
 [info]   at 
 org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
 [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
 [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
 [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
 [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
 [info]   at 
 org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
 [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
 [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
 [info]   at org.scalatest.FunSuite.run(FunSuite.scala:1555)
 [info]   at

[jira] [Commented] (SPARK-9196) DatetimeExpressionsSuite: function current_timestamp is flaky

2015-07-20 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634420#comment-14634420
 ] 

Adrian Wang commented on SPARK-9196:


[~davies]
I got 2 solutions for this problem:
1. Let this function back by something like a lazy val, but we need to put a 
flag for each query indicating whether it has been evaluated.
2. Substitute this function with constants at analysis phrase. This would be a 
little different from hive, since hive get the current timestamp at the 
beginning of evaluation. We can also find a way to mark multiple appearance as 
a single object at analysis phrase.

cc [~marmbrus]

 DatetimeExpressionsSuite: function current_timestamp is flaky
 -

 Key: SPARK-9196
 URL: https://issues.apache.org/jira/browse/SPARK-9196
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu
Assignee: Adrian Wang
Priority: Critical

 {code}
 - function current_timestamp *** FAILED *** (77 milliseconds)
 [info]   Results do not match for query:
 [info]   == Parsed Logical Plan ==
 [info]   'Project [unresolvedalias(('CURRENT_TIMESTAMP() = 
 'CURRENT_TIMESTAMP()))]
 [info]OneRowRelation$
 [info]   
 [info]   == Analyzed Logical Plan ==
 [info]   _c0: boolean
 [info]   Project [(currenttimestamp() = currenttimestamp()) AS _c0#11436]
 [info]OneRowRelation$
 [info]   
 [info]   == Optimized Logical Plan ==
 [info]   Project [false AS _c0#11436]
 [info]OneRowRelation$
 [info]   
 [info]   == Physical Plan ==
 [info]   Project [false AS _c0#11436]
 [info]PhysicalRDD ParallelCollectionRDD[650] at apply at 
 Transformer.scala:22
 [info]   
 [info]   Code Generation: true
 [info]   == RDD ==
 [info]   == Results ==
 [info]   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
 [info]   ![true] [false] (QueryTest.scala:61)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
 org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
 [info]   at 
 org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
 [info]   at org.scalatest.Assertions$class.fail(Assertions.scala:1328)
 [info]   at org.scalatest.FunSuite.fail(FunSuite.scala:1555)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:61)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:67)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply$mcV$sp(DatetimeExpressionsSuite.scala:42)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply(DatetimeExpressionsSuite.scala:39)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply(DatetimeExpressionsSuite.scala:39)
 [info]   at 
 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
 [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
 [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
 [info]   at 
 org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
 [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
 [info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
 [info]   at scala.collection.immutable.List.foreach(List.scala:318)
 [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
 [info]   at 
 org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
 [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
 [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
 [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
 [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
 [info]   at 
 org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
 [info]   at

[jira] [Commented] (SPARK-9196) DatetimeExpressionsSuite: function current_timestamp is flaky

2015-07-20 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634452#comment-14634452
 ] 

Adrian Wang commented on SPARK-9196:


[~marmbrus]we have test for that case. The flaky test is to prove multi entries 
of the same function within one query would return the same value.
Actually, as this function is always foldable, at optimization phrase we will 
get all values, so the gap would not be too large.

p.s: hive's definition for current_timestamp(): Returns the current timestamp 
at the start of query evaluation (as of Hive 1.2.0). All calls of 
current_timestamp within the same query return the same value.

 DatetimeExpressionsSuite: function current_timestamp is flaky
 -

 Key: SPARK-9196
 URL: https://issues.apache.org/jira/browse/SPARK-9196
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu
Assignee: Adrian Wang
Priority: Critical

 {code}
 - function current_timestamp *** FAILED *** (77 milliseconds)
 [info]   Results do not match for query:
 [info]   == Parsed Logical Plan ==
 [info]   'Project [unresolvedalias(('CURRENT_TIMESTAMP() = 
 'CURRENT_TIMESTAMP()))]
 [info]OneRowRelation$
 [info]   
 [info]   == Analyzed Logical Plan ==
 [info]   _c0: boolean
 [info]   Project [(currenttimestamp() = currenttimestamp()) AS _c0#11436]
 [info]OneRowRelation$
 [info]   
 [info]   == Optimized Logical Plan ==
 [info]   Project [false AS _c0#11436]
 [info]OneRowRelation$
 [info]   
 [info]   == Physical Plan ==
 [info]   Project [false AS _c0#11436]
 [info]PhysicalRDD ParallelCollectionRDD[650] at apply at 
 Transformer.scala:22
 [info]   
 [info]   Code Generation: true
 [info]   == RDD ==
 [info]   == Results ==
 [info]   !== Correct Answer - 1 ==   == Spark Answer - 1 ==
 [info]   ![true] [false] (QueryTest.scala:61)
 [info]   org.scalatest.exceptions.TestFailedException:
 [info]   at 
 org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
 [info]   at 
 org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
 [info]   at org.scalatest.Assertions$class.fail(Assertions.scala:1328)
 [info]   at org.scalatest.FunSuite.fail(FunSuite.scala:1555)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:61)
 [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:67)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply$mcV$sp(DatetimeExpressionsSuite.scala:42)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply(DatetimeExpressionsSuite.scala:39)
 [info]   at 
 org.apache.spark.sql.DatetimeExpressionsSuite$$anonfun$2.apply(DatetimeExpressionsSuite.scala:39)
 [info]   at 
 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
 [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
 [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
 [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
 [info]   at 
 org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
 [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
 [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
 [info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
 [info]   at 
 org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
 [info]   at scala.collection.immutable.List.foreach(List.scala:318)
 [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
 [info]   at 
 org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
 [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
 [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
 [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
 [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
 [info]   at 
 org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
 [info]   at

[jira] [Commented] (SPARK-9051) SortMergeCompatibilitySuite is flaky

2015-07-15 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629104#comment-14629104
 ] 

Adrian Wang commented on SPARK-9051:


This might have something to do with SPARK-9027 , not quite sure though. I'll 
take a closer look.

 SortMergeCompatibilitySuite is flaky 
 -

 Key: SPARK-9051
 URL: https://issues.apache.org/jira/browse/SPARK-9051
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Adrian Wang
Priority: Critical

 For example: 
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.3,label=centos/2951/testReport/junit/org.apache.spark.sql.hive.execution/SortMergeCompatibilitySuite/auto_sortmerge_join_16/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9051) SortMergeCompatibilitySuite is flaky

2015-07-15 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629105#comment-14629105
 ] 

Adrian Wang commented on SPARK-9051:


Just find that Michael has revert it and SPARK-6910, and things are OK now. So 
that's the cause.

 SortMergeCompatibilitySuite is flaky 
 -

 Key: SPARK-9051
 URL: https://issues.apache.org/jira/browse/SPARK-9051
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Adrian Wang
Priority: Critical

 For example: 
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.3,label=centos/2951/testReport/junit/org.apache.spark.sql.hive.execution/SortMergeCompatibilitySuite/auto_sortmerge_join_16/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8864) Date/time function and data type design

2015-07-07 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616301#comment-14616301
 ] 

Adrian Wang commented on SPARK-8864:


just provide the precise of current design for your information.

 Date/time function and data type design
 ---

 Key: SPARK-8864
 URL: https://issues.apache.org/jira/browse/SPARK-8864
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0

 Attachments: SparkSQLdatetimeudfs (1).pdf


 Please see the attached design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8864) Date/time function and data type design

2015-07-07 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616299#comment-14616299
 ] 

Adrian Wang commented on SPARK-8864:


no, that's not enough.

 Date/time function and data type design
 ---

 Key: SPARK-8864
 URL: https://issues.apache.org/jira/browse/SPARK-8864
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0

 Attachments: SparkSQLdatetimeudfs (1).pdf


 Please see the attached design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8864) Date/time function and data type design

2015-07-07 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616293#comment-14616293
 ] 

Adrian Wang edited comment on SPARK-8864 at 7/7/15 7:34 AM:


Then we are using a Long for us. Long can be up to 9.2E18, which is more than 
1E8 days. Hive is using a Long for seconds and an int for nanoseconds, but I 
think a single Long here for day-time interval is fine.


was (Author: adrian-wang):
Then we are using a Long for us. Long can be up to 9.2E18, which is more than 
1E11 days. Hive is using a Long for seconds and an int for nanoseconds, but I 
think a single Long here for day-time interval is fine.

 Date/time function and data type design
 ---

 Key: SPARK-8864
 URL: https://issues.apache.org/jira/browse/SPARK-8864
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0

 Attachments: SparkSQLdatetimeudfs (1).pdf


 Please see the attached design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8864) Date/time function and data type design

2015-07-07 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616268#comment-14616268
 ] 

Adrian Wang commented on SPARK-8864:


Thanks for the design.

Two comments:
1. If a IntervalType value is in year-month format, we cannot use 100ns to 
represent it. Hive use two internal types to handle year-month and day-time 
separately.
2. When casting TimestampType into StringType, or casting from 
StringType(unless it is a ISO8601 time string which contains timezone info), we 
should also consider timezone.

 Date/time function and data type design
 ---

 Key: SPARK-8864
 URL: https://issues.apache.org/jira/browse/SPARK-8864
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0

 Attachments: SparkSQLdatetimeudfs.pdf


 Please see the attached design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8864) Date/time function and data type design

2015-07-07 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616293#comment-14616293
 ] 

Adrian Wang commented on SPARK-8864:


Then we are using a Long for us. Long can be up to 9.2E18, which is more than 
1E11 days. Hive is using a Long for seconds and an int for nanoseconds, but I 
think a single Long here for day-time interval is fine.

 Date/time function and data type design
 ---

 Key: SPARK-8864
 URL: https://issues.apache.org/jira/browse/SPARK-8864
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0

 Attachments: SparkSQLdatetimeudfs (1).pdf


 Please see the attached design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5215) concat support in sqlcontext

2015-06-09 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang resolved SPARK-5215.

Resolution: Duplicate

 concat support in sqlcontext
 

 Key: SPARK-5215
 URL: https://issues.apache.org/jira/browse/SPARK-5215
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Adrian Wang

 define concat follow rules in
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8174) date/time function: unix_timestamp

2015-06-09 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578513#comment-14578513
 ] 

Adrian Wang commented on SPARK-8174:


I'll deal with this.

 date/time function: unix_timestamp
 --

 Key: SPARK-8174
 URL: https://issues.apache.org/jira/browse/SPARK-8174
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 3 variants:
 {code}
 unix_timestamp(): long
 Gets current Unix timestamp in seconds.
 unix_timestamp(string|date): long
 Converts time string in format -MM-dd HH:mm:ss to Unix timestamp (in 
 seconds), using the default timezone and the default locale, return 0 if 
 fail: unix_timestamp('2009-03-20 11:30:01') = 1237573801
 unix_timestamp(string date, string pattern): long
 Convert time string with given pattern (see 
 [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) 
 to Unix time stamp (in seconds), return 0 if fail: 
 unix_timestamp('2009-03-20', '-MM-dd') = 1237532400.
 {code}
 See: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8182) date/time function: minute

2015-06-09 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578523#comment-14578523
 ] 

Adrian Wang commented on SPARK-8182:


I'll deal with this.

 date/time function: minute
 --

 Key: SPARK-8182
 URL: https://issues.apache.org/jira/browse/SPARK-8182
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 minute(string|date|timestamp): int
 Returns the minute of the timestamp.
 See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8183) date/time function: second

2015-06-09 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578524#comment-14578524
 ] 

Adrian Wang commented on SPARK-8183:


I'll deal with this.

 date/time function: second
 --

 Key: SPARK-8183
 URL: https://issues.apache.org/jira/browse/SPARK-8183
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 second(string|date|timestamp): int
 Returns the second of the timestamp.
 See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8184) date/time function: weekofyear

2015-06-09 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578525#comment-14578525
 ] 

Adrian Wang commented on SPARK-8184:


I'll deal with this.

 date/time function: weekofyear
 --

 Key: SPARK-8184
 URL: https://issues.apache.org/jira/browse/SPARK-8184
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 weekofyear(string|date|timestamp): int
 Returns the week number of a timestamp string: weekofyear(1970-11-01 
 00:00:00) = 44, weekofyear(1970-11-01) = 44.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8193) date/time function: current_timestamp

2015-06-09 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578601#comment-14578601
 ] 

Adrian Wang commented on SPARK-8193:


I'll deal with this.

 date/time function: current_timestamp
 -

 Key: SPARK-8193
 URL: https://issues.apache.org/jira/browse/SPARK-8193
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 current_timestamp(): timestamp
 Returns the current timestamp at the start of query evaluation (as of Hive 
 1.2.0). All calls of current_timestamp within the same query return the same 
 value.
 We should just replace this with a timestamp literal in the optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8159) Improve SQL/DataFrame expression coverage

2015-06-09 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578509#comment-14578509
 ] 

Adrian Wang commented on SPARK-8159:


Are we missing xpath functions?

 Improve SQL/DataFrame expression coverage
 -

 Key: SPARK-8159
 URL: https://issues.apache.org/jira/browse/SPARK-8159
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin

 This is an umbrella ticket to track new expressions we are adding to 
 SQL/DataFrame.
 For each new expression, we should:
 1. Add a new Expression implementation in 
 org.apache.spark.sql.catalyst.expressions
 2. If applicable, implement the code generated version (by implementing 
 genCode).
 3. Add comprehensive unit tests (for all the data types the expressions 
 support).
 4. If applicable, add a new function for DataFrame in 
 org.apache.spark.sql.functions, and python/pyspark/sql/functions.py for 
 Python.
 For date/time functions, put them in expressions/datetime.scala, and create a 
 DateTimeFunctionSuite.scala for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-8159) Improve SQL/DataFrame expression coverage

2015-06-09 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated SPARK-8159:
---
Comment: was deleted

(was: Are we missing xpath functions?)

 Improve SQL/DataFrame expression coverage
 -

 Key: SPARK-8159
 URL: https://issues.apache.org/jira/browse/SPARK-8159
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin

 This is an umbrella ticket to track new expressions we are adding to 
 SQL/DataFrame.
 For each new expression, we should:
 1. Add a new Expression implementation in 
 org.apache.spark.sql.catalyst.expressions
 2. If applicable, implement the code generated version (by implementing 
 genCode).
 3. Add comprehensive unit tests (for all the data types the expressions 
 support).
 4. If applicable, add a new function for DataFrame in 
 org.apache.spark.sql.functions, and python/pyspark/sql/functions.py for 
 Python.
 For date/time functions, put them in expressions/datetime.scala, and create a 
 DateTimeFunctionSuite.scala for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8159) Improve SQL/DataFrame expression coverage

2015-06-09 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578508#comment-14578508
 ] 

Adrian Wang commented on SPARK-8159:


Are we missing xpath functions?

 Improve SQL/DataFrame expression coverage
 -

 Key: SPARK-8159
 URL: https://issues.apache.org/jira/browse/SPARK-8159
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin

 This is an umbrella ticket to track new expressions we are adding to 
 SQL/DataFrame.
 For each new expression, we should:
 1. Add a new Expression implementation in 
 org.apache.spark.sql.catalyst.expressions
 2. If applicable, implement the code generated version (by implementing 
 genCode).
 3. Add comprehensive unit tests (for all the data types the expressions 
 support).
 4. If applicable, add a new function for DataFrame in 
 org.apache.spark.sql.functions, and python/pyspark/sql/functions.py for 
 Python.
 For date/time functions, put them in expressions/datetime.scala, and create a 
 DateTimeFunctionSuite.scala for testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8177) date/time function: year

2015-06-09 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578518#comment-14578518
 ] 

Adrian Wang commented on SPARK-8177:


I'll deal with this.

 date/time function: year
 

 Key: SPARK-8177
 URL: https://issues.apache.org/jira/browse/SPARK-8177
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 year(string|date|timestamp): int
 Returns the year part of a date or a timestamp string: year(1970-01-01 
 00:00:00) = 1970, year(1970-01-01) = 1970.
 See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 215 matches

Mail list logo