date:20170212

[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16910
  
**[Test build #72808 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72808/testReport)**
 for PR 16910 at commit 
[`cb98375`](https://github.com/apache/spark/commit/cb983756f7fb270c545f90a98d03e0db3ccc0bd9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16870
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72799/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16870
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-02-12 Thread windpiger

GitHub user windpiger opened a pull request:

https://github.com/apache/spark/pull/16910

[SPARK-19575][SQL]Reading from or writing to a hive serde table with a non 
pre-existing location should succeed

## What changes were proposed in this pull request?

This PR is a folllowup work from 
[SPARK-19329](https://issues.apache.org/jira/browse/SPARK-19329), which has 
unify the action when we reading from or writing to a datasource table with a 
non pre-existing locaiton, so here we should also unify the hive serde tables.

Currently when we select from a hive serde table which has a non 
pre-existing location will throw an exception:
```
Input path does not exist: 
file:/tmp/spark-37caa4e6-5a6a-4361-a905-06cc56afb274
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/tmp/spark-37caa4e6-5a6a-4361-a905-06cc56afb274
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2080)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at org.apache.spark.sql.QueryTest$.checkAnswer(QueryTest.scala:258)
```

## How was this patch tested?
unit tests added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/windpiger/spark selectHiveFromNotExistLocation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16910


commit cb983756f7fb270c545f90a98d03e0db3ccc0bd9
Author: windpiger 
Date:   2017-02-13T07:50:55Z

[SPARK-19575][SQL]Reading from or writing to a hive serde table with a non 
pre-existing location should succeed




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16870
  
**[Test build #72799 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72799/testReport)**
 for PR 16870 at commit 
[`3b1cfd4`](https://github.com/apache/spark/commit/3b1cfd41ba6171633a85f42482391c1c7d25182e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16891: [SPARK-19318][SQL] Fix to treat JDBC connection properti...

2017-02-12 Thread sureshthalamati

Github user sureshthalamati commented on the issue:

https://github.com/apache/spark/pull/16891
  
Thank you for reviewing the PR @cloud-fan.  Addressed the review comments, 
please let me know if it requires any further changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16868
  
Very thoughtful consideration. Thanks for your explanation and suggestion! 
@tejasapatil what do you think? @gatorsmile @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16870
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16870
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72798/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16870
  
**[Test build #72798 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72798/testReport)**
 for PR 16870 at commit 
[`7238e94`](https://github.com/apache/spark/commit/7238e94ac762f03eca3f67d50acf090bb2cc9cf9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...

2017-02-12 Thread sureshthalamati

Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/16891#discussion_r100737587
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
@@ -75,7 +75,7 @@ class JDBCWriteSuite extends SharedSQLContext with 
BeforeAndAfter {
   s"""
 |CREATE OR REPLACE TEMPORARY VIEW PEOPLE1
 |USING org.apache.spark.sql.jdbc
-|OPTIONS (url '$url1', dbtable 'TEST.PEOPLE1', user 'testUser', 
password 'testPass')
+|OPTIONS (url '$url1', dbTable 'TEST.PEOPLE1', user 'testUser', 
password 'testPass')
--- End diff --

Yes, they should be case-insensitive.  Just additional case-sensitivity 
test case.
During testing of my fix I did not notice a test in the write suite for 
data source table for case-sensitivity checking during insert. I flipped the 
"dbTable" to  make sure case-insensitivity is not broken in this case.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16891: [SPARK-19318][SQL] Fix to treat JDBC connection properti...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16891
  
**[Test build #72807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72807/testReport)**
 for PR 16891 at commit 
[`a156074`](https://github.com/apache/spark/commit/a1560742f2196ba04c14ad50e955bdcc839c4ad8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...

2017-02-12 Thread sureshthalamati

Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/16891#discussion_r100737505
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala
 ---
@@ -23,16 +23,30 @@ package org.apache.spark.sql.catalyst.util
 class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, 
String]
--- End diff --

Good question. For some reason I was hung up on making  only the 
case-sensitive key available to the caller. Changed the  code to expose the 
original map , it made code simpler. Thank you very much  for the suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...

2017-02-12 Thread sureshthalamati

Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/16891#discussion_r100737377
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -149,4 +155,29 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 assert(values.getDate(9).equals(dateVal))
 assert(values.getTimestamp(10).equals(timestampVal))
   }
+
+  test("SPARK-19318: connection property keys should be case-sensitive") {
+sql(
+  s"""
+ |CREATE TEMPORARY TABLE datetime
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...

2017-02-12 Thread sureshthalamati

Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/16891#discussion_r100737351
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -62,6 +62,12 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   }
 
   override def dataPreparation(conn: Connection): Unit = {
+conn.prepareStatement("CREATE TABLE datetime (id NUMBER(10), d DATE, t 
TIMESTAMP)")
+  .executeUpdate()
+conn.prepareStatement("INSERT INTO datetime VALUES ("
+  + "1, {d '1991-11-09'}, {ts '1996-01-01 01:23:45'})").executeUpdate()
+conn.prepareStatement("CREATE TABLE datetime1 (id NUMBER(10), d DATE, 
t TIMESTAMP)")
--- End diff --

Thank you for reviewing the patch. I think cleanup is not required, these 
tables are not persistent across the test runs. They are  cleaned up when 
docker container is removed at the end of the test.  Currently I did notice any 
setup in the afterAll() to do it after the test.

I moved up creation of temporary views also  to the same place, to keep 
them together. And possibly any future  tests can also use these tables. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16891: [SPARK-19318][SQL] Fix to treat JDBC connection p...

2017-02-12 Thread sureshthalamati

Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/16891#discussion_r100737390
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -149,4 +155,29 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 assert(values.getDate(9).equals(dateVal))
 assert(values.getTimestamp(10).equals(timestampVal))
   }
+
+  test("SPARK-19318: connection property keys should be case-sensitive") {
+sql(
+  s"""
+ |CREATE TEMPORARY TABLE datetime
+ |USING org.apache.spark.sql.jdbc
+ |OPTIONS (url '$jdbcUrl', dbTable 'datetime', 
oracle.jdbc.mapDateToTimestamp 'false')
+  """.stripMargin.replaceAll("\n", " "))
+val row = sql("SELECT * FROM datetime where id = 1").collect()(0)
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16908
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16908
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72805/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16908
  
**[Test build #72805 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72805/testReport)**
 for PR 16908 at commit 
[`b97b49b`](https://github.com/apache/spark/commit/b97b49b11f3c6113b5b9491e5469ca7a011beac6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16868
  
Very serious consideration. Thanks for your explanation and suggestion! 
what do you think? @gatorsmile @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16672: [SPARK-19329][SQL]Reading from or writing to a datasourc...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16672
  
Could you move the test cases to `DDLSuite.scala`? This is not for Hive 
specific. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-02-12 Thread titicaca

Github user titicaca commented on the issue:

https://github.com/apache/spark/pull/16689
  
Yes. The JIRA id is SPARK-19342. Thank you for the help and advices :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16909: [SPARK-13450] Introduce ExternalAppendOnlyUnsafeRowArray...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16909
  
**[Test build #72806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72806/testReport)**
 for PR 16909 at commit 
[`e9cdd30`](https://github.com/apache/spark/commit/e9cdd30252bce12d34f52cc31f95adb271ef2209).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16672: [SPARK-19329][SQL]Reading from or writing to a da...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16672#discussion_r100735110
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1431,4 +1432,133 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a data source table which has a not existed 
location should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""CREATE TABLE t(a string, b int)
+  |USING parquet
+  |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
+var table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
--- End diff --

Another general comment. Please avoid using `var`, if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16909: [SPARK-13450] Introduce UnsafeRowExternalArray. Change S...

2017-02-12 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16909
  
@rxin : can you please recommend someone who could review this PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16672: [SPARK-19329][SQL]Reading from or writing to a da...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16672#discussion_r100735007
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1431,4 +1432,133 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a data source table which has a not existed 
location should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""CREATE TABLE t(a string, b int)
+  |USING parquet
+  |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
--- End diff --

A general comment about the style. We prefer to the following indentation 
styles.
```Scala
sql(
  """
|SELECT '1' AS part, key, value FROM VALUES
|(1, "one"), (2, "two"), (3, null) AS data(key, value)
  """.stripMargin)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16909: [SPARK-13450] Introduce UnsafeRowExternalArray. Change S...

2017-02-12 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16909
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16909: [SPARK-13450] Introduce UnsafeRowExternalArray. C...

2017-02-12 Thread tejasapatil

GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/16909

[SPARK-13450] Introduce UnsafeRowExternalArray. Change SortMergeJoin and 
WindowExec to use it

## What issue does this PR address ?

Jira: https://issues.apache.org/jira/browse/SPARK-13450

In `SortMergeJoinExec`, rows of the right relation having the same value 
for a join key are buffered in-memory. In case of skew, this causes OOMs (see 
comments in SPARK-13450 for more details). Heap dump from a failed job confirms 
this : 
https://issues.apache.org/jira/secure/attachment/12846382/heap-dump-analysis.png
 . While its possible to increase the heap size to workaround, Spark should be 
resilient to such issues as skews can happen arbitrarily.

## Change proposed in this pull request

- Introduces `ExternalAppendOnlyUnsafeRowArray` 
  - It holds `UnsafeRow`s in-memory upto a certain threshold. 
  - After the threshold is hit, it switches to `UnsafeExternalSorter` which 
enables spilling of the rows to disk. It does NOT sort the data.
  - Allows iterating the array multiple times. However, any alteration to 
the array (using `add` or `clear`) will invalidate the existing iterator(s)
- `WindowExec` was already using `UnsafeExternalSorter` to support 
spilling. Changed it to use the new array
- Changed `SortMergeJoinExec` to use the new array implementation
  - NOTE: I have not changed FULL OUTER JOIN to use this new array 
implementation. Changing that will need more surgery and I will rather put up a 
separate PR for that once this gets in.

Note for reviewers: The diff can be divided into 3 (or more) parts. My 
motive behind having all the changes in a single PR was to demonstrate that the 
API is sane and supports 2 use cases. If reviewing the whole thing as 3 
separate PRs would help, I am happy to make the spilt.

## How was this patch tested ?

 Unit testing
- Added unit tests `ExternalAppendOnlyUnsafeRowArray` to validate all its 
APIs and access patterns
- Added unit test for `SortMergeExec`
 - with and without spill for inner join, left outer join, right outer join 
to confirm that the spill threshold config behaves as expected and output is as 
expected.
 - This PR touches the scanning logic in `SortMergeExec` for _all_ joins 
(except FULL OUTER JOIN). However, I expect existing test cases to cover that 
there is no regression in correctness.
- Added unit test for `WindowExec` to check behavior of spilling and 
correctness of results.

 Stress testing
- Confirmed that OOM is gone by running against a production job which used 
to OOM
- Since I cannot share details about prod workload externally, created 
synthetic data to mimic to issue. Ran before and after the fix to demonstrate 
the issue and query success with this PR

Generating the synthetic data

```
./bin/spark-shell --driver-memory=6G

import org.apache.spark.sql._
val hc = SparkSession.builder.master("local").getOrCreate()

hc.sql("DROP TABLE IF EXISTS spark_13450_large_table").collect
hc.sql("DROP TABLE IF EXISTS spark_13450_one_row_table").collect

val df1 = (0 until 1).map(i => ("10", "100", i.toString, (i * 
2).toString)).toDF("i", "j", "str1", "str2")

df1.write.format("org.apache.spark.sql.hive.orc.OrcFileFormat").bucketBy(100, 
"i", "j").sortBy("i", "j").saveAsTable("spark_13450_one_row_table")

val df2 = (0 until 300).map(i => ("10", "100", i.toString, (i * 
2).toString)).toDF("i", "j", "str1", "str2")

df2.write.format("org.apache.spark.sql.hive.orc.OrcFileFormat").bucketBy(100, 
"i", "j").sortBy("i", "j").saveAsTable("spark_13450_large_table")
```

Ran this against trunk VS local build with this PR. OOM repros with trunk 
and with the fix this query runs fine.

```
./bin/spark-shell --driver-java-options="-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/tmp/spark.driver.heapdump.hprof"

import org.apache.spark.sql._
val hc = SparkSession.builder.master("local").getOrCreate()
hc.sql("SET spark.sql.autoBroadcastJoinThreshold=1")
hc.sql("SET spark.sql.sortMergeJoinExec.buffer.spill.threshold=1")

hc.sql("DROP TABLE IF EXISTS spark_13450_result").collect
hc.sql("""
  CREATE TABLE spark_13450_result
  AS
  SELECT
a.i AS a_i, a.j AS a_j, a.str1 AS a_str1, a.str2 AS a_str2,
b.i AS b_i, b.j AS b_j, b.str1 AS b_str1, b.str2 AS b_str2
  FROM
spark_13450_one_row_table a 
  JOIN
spark_13450_large_table b 
  ON
a.i=b.i AND
a.j=b.j
""")
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark SPARK-13450_smb_buffer_oom

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16909.patch

[GitHub] spark pull request #16672: [SPARK-19329][SQL]Reading from or writing to a da...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16672#discussion_r100734735
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1431,4 +1432,133 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("insert data to a data source table which has a not existed 
location should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""CREATE TABLE t(a string, b int)
+  |USING parquet
+  |OPTIONS(path "file:${dir.getCanonicalPath}")
+   """.stripMargin)
+var table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+dir.delete
+assert(!new File(table.location).exists())
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+Utils.deleteRecursively(dir)
+assert(!new File(table.location).exists())
+spark.sql("INSERT OVERWRITE TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+
+var newDir = dir.getAbsolutePath.stripSuffix("/") + "/x"
+spark.sql(s"ALTER TABLE t SET LOCATION '$newDir'")
+spark.sessionState.catalog.refreshTable(TableIdentifier("t"))
+
+table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(table.location == newDir)
+assert(!new File(newDir).exists())
+
+spark.sql("INSERT INTO TABLE t SELECT 'c', 1")
+checkAnswer(spark.table("t"), Row("c", 1) :: Nil)
+  }
+}
+  }
+
+  test("insert into a data source table with no existed partition location 
should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""CREATE TABLE t(a int, b int, c int, d int)
+  |USING parquet
+  |PARTITIONED BY(a, b)
+  |LOCATION "file:${dir.getCanonicalPath}"
+   """.stripMargin)
+var table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+val expectedPath = s"file:${dir.getAbsolutePath.stripSuffix("/")}"
+assert(table.location.stripSuffix("/") == expectedPath)
+
+spark.sql("INSERT INTO TABLE t PARTITION(a=1, b=2) SELECT 3, 4")
+checkAnswer(spark.table("t"), Row(3, 4, 1, 2) :: Nil)
+
+val partLoc = new File(s"${dir.getAbsolutePath}/a=1")
--- End diff --

A general comment about the test cases. Can you please check whether the 
directory exists after the insert? It can help others confirm the path is 
correct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16908
  
**[Test build #72805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72805/testReport)**
 for PR 16908 at commit 
[`b97b49b`](https://github.com/apache/spark/commit/b97b49b11f3c6113b5b9491e5469ca7a011beac6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16908: [SPARK-19574][ML][Documentation] Fix Liquid Exception: S...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16908
  
cc @srowen @anshbansal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16908: [SPARK-19574][ML][Documentation] Fix Liquid Excep...

2017-02-12 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/16908

[SPARK-19574][ML][Documentation] Fix Liquid Exception: Start indices amount 
is not equal to end indices amount

### What changes were proposed in this pull request?
```
Liquid Exception: Start indices amount is not equal to end indices amount, 
see 
/Users/xiao/IdeaProjects/sparkDelivery/docs/../examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java.
 in ml-features.md
```

So far, the build is broken after merging 
https://github.com/apache/spark/pull/16789

This PR is to fix it. 

## How was this patch tested?
Manual

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark docMLFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16908.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16908


commit b97b49b11f3c6113b5b9491e5469ca7a011beac6
Author: Xiao Li 
Date:   2017-02-13T07:00:05Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16902: [SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffset...

2017-02-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16902


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16902: [SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader'...

2017-02-12 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16902
  
Good catch. LGTM. Thanks!

Merging to master and 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16789: [SPARK-19444][ML][Documentation] Fix imports not ...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16789#discussion_r100733678
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java 
---
@@ -35,13 +35,11 @@
 import org.apache.spark.sql.types.Metadata;
 import org.apache.spark.sql.types.StructField;
 import org.apache.spark.sql.types.StructType;
-// $example off$
 
-// $example on:untyped_ops$
 // col("...") is preferable to df.col("...")
 import static org.apache.spark.sql.functions.callUDF;
 import static org.apache.spark.sql.functions.col;
-// $example off:untyped_ops$
+// $example off
--- End diff --

It misses `$`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16672: [SPARK-19329][SQL]Reading from or writing to a datasourc...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16672
  
**[Test build #72804 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72804/testReport)**
 for PR 16672 at commit 
[`334e89f`](https://github.com/apache/spark/commit/334e89fe7258ab6a6773d534bee469cda7cd6d0c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r100730262
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -116,48 +114,66 @@ object TypeCoercion {
* i.e. the main difference with [[findTightestCommonType]] is that here 
we allow some
* loss of precision when widening decimal and double, and promotion to 
string.
*/
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): 
Option[DataType] = (t1, t2) match {
-case (t1: DecimalType, t2: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(t1, t2))
-case (t: IntegralType, d: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (d: DecimalType, t: IntegralType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: 
FractionalType) =>
-  Some(DoubleType)
-case _ =>
-  findTightestCommonTypeToString(t1, t2)
+  def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = {
+findTightestCommonType(t1, t2)
--- End diff --

I see. Thank you for catching it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON parsing

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16750
  
**[Test build #72803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72803/testReport)**
 for PR 16750 at commit 
[`a455f4f`](https://github.com/apache/spark/commit/a455f4f900939aa961f9cc1e652c60c9d8d5c523).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/16776
  
https://issues.apache.org/jira/browse/SPARK-19573 is created to track the 
issue on non-consistent na-droping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-02-12 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r100729875
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -859,6 +859,48 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("Write timestamps correctly with timestampFormat option and 
timeZone option") {
+withTempDir { dir =>
+  // With dateFormat option and timeZone option.
+  val timestampsWithFormatPath = 
s"${dir.getCanonicalPath}/timestampsWithFormat.csv"
+  val timestampsWithFormat = spark.read
+.format("csv")
+.option("header", "true")
+.option("inferSchema", "true")
+.option("timestampFormat", "dd/MM/ HH:mm")
+.load(testFile(datesFile))
+  timestampsWithFormat.write
+.format("csv")
+.option("header", "true")
+.option("timestampFormat", "/MM/dd HH:mm")
+.option("timeZone", "GMT")
+.save(timestampsWithFormatPath)
+
+  // This will load back the timestamps as string.
+  val stringTimestampsWithFormat = spark.read
+.format("csv")
+.option("header", "true")
+.option("inferSchema", "false")
--- End diff --

The schema will be `StringType` for all columns. 
([CSVInferSchema.scala#L68](https://github.com/ueshin/apache-spark/blob/ffc4912e17cc900fc9d7ceefd0f66461109728e9/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala#L68))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-02-12 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r100729866
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -58,13 +59,15 @@ private[sql] class JSONOptions(
   private val parseMode = parameters.getOrElse("mode", "PERMISSIVE")
   val columnNameOfCorruptRecord = 
parameters.get("columnNameOfCorruptRecord")
 
+  val timeZone: TimeZone = 
TimeZone.getTimeZone(parameters.getOrElse("timeZone", defaultTimeZoneId))
+
   // Uses `FastDateFormat` which can be direct replacement for 
`SimpleDateFormat` and thread-safe.
   val dateFormat: FastDateFormat =
 FastDateFormat.getInstance(parameters.getOrElse("dateFormat", 
"-MM-dd"), Locale.US)
--- End diff --

That is a combination of the `dateFormat` and 
`DateTimeUtils.millisToDays()` (see 
[JacksonParser.scala#L251](https://github.com/ueshin/apache-spark/blob/ffc4912e17cc900fc9d7ceefd0f66461109728e9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L251)
 or 
[UnivocityParser.scala#L137](https://github.com/ueshin/apache-spark/blob/ffc4912e17cc900fc9d7ceefd0f66461109728e9/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala#L137)).

If both timezones of the `dateFormat` and `DateTimeUtils.millisToDays()` 
are the same, the days will be calculated correctly.
Here the `dateFormat` will have the default timezone to parse and 
`DateTimeUtils.millisToDays()` will also use the default timezone to calculate 
days here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes

2017-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16777
  
@gatorsmile, Can we make this merged and then add test cases for them 
separately? It seems the results are the same. I ran two tests as below:

```scala
val integralTypes =
  IndexedSeq(
ByteType,
ShortType,
IntegerType,
LongType)

val decimals = (-38 to 38).flatMap { p =>
  (-38 to 38).flatMap(s => allCatch opt DecimalType(p, s))
}

assert(decimals.nonEmpty)
integralTypes.foreach { it =>
  test(s"$it test") {
decimals.foreach { d =>

  // From TypeCoercion.findWiderTypeForTwo
  val maybeType1 = (d, it) match {
case (d: DecimalType, t: IntegralType) =>
  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
case _ => None
  }

  // From TypeCoercion.findTightestCommonType
  val maybeType2 = (d, it) match {
case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
  Some(t1)
case _ => None
  }

  if (maybeType2.isDefined) {
val t1 = maybeType1.get
val t2 = maybeType2.get
assert(t1 == t2)
  }
}
  }
}
```

```scala
val integralTypes =
  IndexedSeq(
ByteType,
ShortType,
IntegerType,
LongType)

val decimals = (-38 to 38).flatMap { p =>
  (-38 to 38).flatMap(s => allCatch opt DecimalType(p, s))
}
  
assert(decimals.nonEmpty)
  
integralTypes.foreach { it =>
  test(s"$it test") {
val widenDecimals = decimals.flatMap { d =>
  // From TypeCoercion.findWiderTypeForTwo
  (d, it) match {
case (d: DecimalType, t: IntegralType) =>
  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
case _ => None
  }
}.toSet

val tightDecimals = decimals.flatMap { d =>
  // From TypeCoercion.findTightestCommonType
  (d, it) match {
case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
  Some(t1)
case _ => None
  }
}.toSet

assert(widenDecimals.nonEmpty)
assert(tightDecimals.nonEmpty)
assert(tightDecimals.subsetOf(widenDecimals))
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16776
  
**[Test build #72802 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72802/testReport)**
 for PR 16776 at commit 
[`4b7ad19`](https://github.com/apache/spark/commit/4b7ad193729d3829d3222d4cb44c6aea9c557d77).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16907: [SPARK-19582][SPARKR] Allow to disable hive in sparkR sh...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16907
  
**[Test build #72801 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72801/testReport)**
 for PR 16907 at commit 
[`8329be6`](https://github.com/apache/spark/commit/8329be6dce176022d08bb3109dc994434bf7c84a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16776#discussion_r100727960
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -58,49 +58,54 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @param probabilities a list of quantile probabilities
*   Each number must belong to [0, 1].
*   For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
-   * @param relativeError The relative target precision to achieve 
(greater or equal to 0).
+   * @param relativeError The relative target precision to achieve 
(greater than or equal to 0).
*   If set to zero, the exact quantiles are computed, which could be 
very expensive.
*   Note that values greater than 1 are accepted but give the same 
result as 1.
* @return the approximate quantiles at the given probabilities
*
-   * @note NaN values will be removed from the numerical column before 
calculation
+   * @note null and NaN values will be removed from the numerical column 
before calculation. If
+   *   the dataframe is empty or all rows contain null or NaN, null is 
returned.
*
* @since 2.0.0
*/
   def approxQuantile(
   col: String,
   probabilities: Array[Double],
   relativeError: Double): Array[Double] = {
-StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(),
-  Seq(col), probabilities, relativeError).head.toArray
+val res = approxQuantile(Array(col), probabilities, relativeError)
+Option(res).map(_.head).orNull
   }
 
   /**
* Calculates the approximate quantiles of numerical columns of a 
DataFrame.
-   * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* 
approxQuantile]] for
-   * detailed description.
+   * @see `[[DataFrameStatsFunctions.approxQuantile(col:Str* 
approxQuantile]]` for detailed
--- End diff --

`DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile`
->
`DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16907: [SPARK-19582][SPARKR] Allow to disable hive in sp...

2017-02-12 Thread zjffdu

GitHub user zjffdu opened a pull request:

https://github.com/apache/spark/pull/16907

[SPARK-19582][SPARKR] Allow to disable hive in sparkR shell

## What changes were proposed in this pull request?
SPARK-15236 do this for scala shell, this ticket is for sparkR shell. This 
is not only for sparkR itself, but can also benefit downstream project like 
livy which use shell.R for its interactive session. For now, livy has no 
control of whether enable hive or not.

## How was this patch tested?

Tested it manually, run `bin/sparkR --master local --conf 
spark.sql.catalogImplementation=in-memory` and verify hive is not enabled. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/spark SPARK-19572

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16907


commit 8329be6dce176022d08bb3109dc994434bf7c84a
Author: Jeff Zhang 
Date:   2017-02-13T05:52:22Z

[SPARK-19582][SPARKR] Allow to disable hive in sparkR shell




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r100727682
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -116,48 +114,66 @@ object TypeCoercion {
* i.e. the main difference with [[findTightestCommonType]] is that here 
we allow some
* loss of precision when widening decimal and double, and promotion to 
string.
*/
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): 
Option[DataType] = (t1, t2) match {
-case (t1: DecimalType, t2: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(t1, t2))
-case (t: IntegralType, d: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (d: DecimalType, t: IntegralType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: 
FractionalType) =>
-  Some(DoubleType)
-case _ =>
-  findTightestCommonTypeToString(t1, t2)
+  def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = {
+findTightestCommonType(t1, t2)
--- End diff --

We should make them consistent. That is why I think it is right to make the 
change, even if it causes the behavior changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16868
  
>> we don't need to do check whether the targetTable.storage.locationUri is 
the same with sourceTable.storage.locationUri

We should not do that check for external tables. But continue doing that 
for other types of tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16878: [SPARK-19539][SQL] Block duplicate temp table during cre...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16878
  
**[Test build #72800 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72800/testReport)**
 for PR 16878 at commit 
[`f7253c5`](https://github.com/apache/spark/commit/f7253c578d0a7c712bd1e42d46362ab377d93923).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16672: [SPARK-19329][SQL]Reading from or writing to a ta...

2017-02-12 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16672#discussion_r100725345
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -754,6 +754,8 @@ case class AlterTableSetLocationCommand(
 // No partition spec is specified, so we set the location for the 
table itself
 catalog.alterTable(table.withNewStorage(locationUri = 
Some(location)))
 }
+
+catalog.refreshTable(table.identifier)
--- End diff --

sorry, the test case hit the bug, so I fix it here, I will avoid the bug to 
use clear cache.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16870
  
**[Test build #72799 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72799/testReport)**
 for PR 16870 at commit 
[`3b1cfd4`](https://github.com/apache/spark/commit/3b1cfd41ba6171633a85f42482391c1c7d25182e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16672: [SPARK-19329][SQL]Reading from or writing to a table wit...

2017-02-12 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16672
  
ok, let me create a new pr for hive serde tables, and continue to finish 
this pr~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16870
  
**[Test build #72798 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72798/testReport)**
 for PR 16870 at commit 
[`7238e94`](https://github.com/apache/spark/commit/7238e94ac762f03eca3f67d50acf090bb2cc9cf9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16620
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72797/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16620
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16620
  
**[Test build #72797 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72797/testReport)**
 for PR 16620 at commit 
[`46ef5a3`](https://github.com/apache/spark/commit/46ef5a369902ce2ca8c0dfde64b973647f5fffeb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100723941
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -500,6 +527,23 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(date1.getTime / 1000L), Row(date2.getTime / 1000L)))
 checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
+
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

the same with above~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100723919
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -477,6 +483,27 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(df.selectExpr(s"unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
 
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.select(unix_timestamp(col("x"))), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

yes, it is ts1, the timestamp of `x1`   is `ts1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes

2017-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16777
  
I see what you mean. The code paths are now different. Let me try to 
investigate it and split them. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r100723132
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -116,48 +114,66 @@ object TypeCoercion {
* i.e. the main difference with [[findTightestCommonType]] is that here 
we allow some
* loss of precision when widening decimal and double, and promotion to 
string.
*/
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): 
Option[DataType] = (t1, t2) match {
-case (t1: DecimalType, t2: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(t1, t2))
-case (t: IntegralType, d: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (d: DecimalType, t: IntegralType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: 
FractionalType) =>
-  Some(DoubleType)
-case _ =>
-  findTightestCommonTypeToString(t1, t2)
+  def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = {
+findTightestCommonType(t1, t2)
--- End diff --

Aha, thank you for correcting me. I overlooked but the result should be 
still the same, shouldn't it?

- `DecimalType.isWiderThan`

```
(p1 - s1) >= (p2 - s2) && s1 >= s2
```

- DecimalPrecision.widerDecimalType

```
 max(s1, s2) + max(p1-s1, p2-s2), max(s1, s2)
```

If both are different, we were already applying different type coercion 
rules between `findWiderTypeWithoutStringPromotion` and `findWiderTypeForTwo`, 
I guess we should match them with the same given 
https://github.com/apache/spark/pull/14439 ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16776
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16776
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72795/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16776
  
**[Test build #72795 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72795/testReport)**
 for PR 16776 at commit 
[`c77755d`](https://github.com/apache/spark/commit/c77755d0a0ec386d76500eee8fbdb1156382de21).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16776
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16776
  
**[Test build #72794 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72794/testReport)**
 for PR 16776 at commit 
[`a3171e4`](https://github.com/apache/spark/commit/a3171e4065afb26e95f1136f823e59a017a72b19).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16776
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72794/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-02-12 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16906
  
Let me take a look tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-02-12 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/16906
  
 @holdenk Please help review 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16868
  
Do you mean that we don't need to do check whether the 
targetTable.storage.locationUri is the same with 
sourceTable.storage.locationUri or not ? @tejasapatil 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16733: [SPARK-19392][SQL] Fix the bug that throws an exception ...

2017-02-12 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16733
  
okay, I'll close this and jira, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16733: [SPARK-19392][SQL] Fix the bug that throws an exc...

2017-02-12 Thread maropu

Github user maropu closed the pull request at:

https://github.com/apache/spark/pull/16733


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r100719964
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -116,48 +114,66 @@ object TypeCoercion {
* i.e. the main difference with [[findTightestCommonType]] is that here 
we allow some
* loss of precision when widening decimal and double, and promotion to 
string.
*/
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): 
Option[DataType] = (t1, t2) match {
-case (t1: DecimalType, t2: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(t1, t2))
-case (t: IntegralType, d: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (d: DecimalType, t: IntegralType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: 
FractionalType) =>
-  Some(DoubleType)
-case _ =>
-  findTightestCommonTypeToString(t1, t2)
+  def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = {
+findTightestCommonType(t1, t2)
--- End diff --

The original `findTightestCommonTypeToString` does not handle `DecimalType 
`.  However, this PR is calling the `findTightestCommonType ` at first. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r100719832
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -116,48 +114,66 @@ object TypeCoercion {
* i.e. the main difference with [[findTightestCommonType]] is that here 
we allow some
* loss of precision when widening decimal and double, and promotion to 
string.
*/
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): 
Option[DataType] = (t1, t2) match {
-case (t1: DecimalType, t2: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(t1, t2))
-case (t: IntegralType, d: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (d: DecimalType, t: IntegralType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: 
FractionalType) =>
-  Some(DoubleType)
-case _ =>
-  findTightestCommonTypeToString(t1, t2)
+  def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = {
+findTightestCommonType(t1, t2)
--- End diff --

The result is the same? 

See the cases in `findTightestCommonType`: 
https://github.com/HyukjinKwon/spark/blob/510a0eee43030abbf37ef922684e6165d6f1e1c8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L87-L90




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16777
  
Yeah, the first PR is for refactoring and cleaning up 
`findWiderTypeForTwo`. We need to add the test cases for the behavior changes. 
We might also need to document this in the release note, because it changes the 
output types.

The second one is for `Type coercion between ArrayTypes`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16733: [SPARK-19392][SQL] Fix the bug that throws an exception ...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16733
  
I prefer to closing it now. If users hit this again, we can revisit it. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16906
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16868
  
There are two main uses of EXTERNAL tables I am aware of:

1. Ingest data from non-hive locations into Hive tables. This can be 
covered by adding test case for reading from external table creating using the 
command this PR enables

2. Create a logical "pointer" to an existing hive table / partition 
(without creating multiple copies of the underlying data). Testing if the 
destination table can have the same location as of the source table will cover 
this.

I don't think Spark's interpretation of external tables is different from 
Hive's so its OK to support both.

BTW: If you are supporting 1st use case, one can mimic to get behavior of 
2nd use case by creating external table with a fake location and later issuing 
a `ALTER TABLE SET LOCATION` command to make it point to an existing table's 
location. There is really no mechanism to guard against having EXTERNAL tables 
not point to an existing table / partition in Spark. So, both use cases were 
already possible in Spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16906
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72796/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16906
  
**[Test build #72796 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72796/testReport)**
 for PR 16906 at commit 
[`431bcf8`](https://github.com/apache/spark/commit/431bcf8d332afe9d971b1f44a51e5dd2ca32ff81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r100716751
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -116,48 +114,66 @@ object TypeCoercion {
* i.e. the main difference with [[findTightestCommonType]] is that here 
we allow some
* loss of precision when widening decimal and double, and promotion to 
string.
*/
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): 
Option[DataType] = (t1, t2) match {
-case (t1: DecimalType, t2: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(t1, t2))
-case (t: IntegralType, d: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (d: DecimalType, t: IntegralType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: 
FractionalType) =>
-  Some(DoubleType)
-case _ =>
-  findTightestCommonTypeToString(t1, t2)
+  def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = {
+findTightestCommonType(t1, t2)
--- End diff --

@cloud-fan refactored this logic recently and I believe he didn't missed 
this part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes

2017-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16777
  
Do you mean two PRs for cleaning up the logics here and the support of 
array type coercion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16868
  
In @tejasapatil's comment, Whether we need to be exactly the same as Hive? 
@gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r100716252
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -116,48 +114,66 @@ object TypeCoercion {
* i.e. the main difference with [[findTightestCommonType]] is that here 
we allow some
* loss of precision when widening decimal and double, and promotion to 
string.
*/
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): 
Option[DataType] = (t1, t2) match {
-case (t1: DecimalType, t2: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(t1, t2))
-case (t: IntegralType, d: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (d: DecimalType, t: IntegralType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: 
FractionalType) =>
-  Some(DoubleType)
-case _ =>
-  findTightestCommonTypeToString(t1, t2)
+  def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = {
+findTightestCommonType(t1, t2)
--- End diff --

Yes, it is true that the type dispatch order was changed but 
`findTightestCommonType` does not take care of `DecimalType` therefore the 
results would be the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16620
  
**[Test build #72797 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72797/testReport)**
 for PR 16620 at commit 
[`46ef5a3`](https://github.com/apache/spark/commit/46ef5a369902ce2ca8c0dfde64b973647f5fffeb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16777: [SPARK-19435][SQL] Type coercion between ArrayTypes

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16777
  
I think we need to separate the changes from the support of `Type coercion 
between ArrayTypes`? Could you submit another PR at first? We might need extra 
test cases for this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16733: [SPARK-19392][SQL] Fix the bug that throws an exception ...

2017-02-12 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16733
  
yea, I think we do not need to handle this.
Either way, it'd be better to just add checking the exception in tests?;
```
intercept[NoSuchElementException] {
  assert(oracleDialect.getCatalystType(java.sql.Types.NUMERIC, "numeric", 
0, metadata1) ==
Some(DecimalType(DecimalType.MAX_PRECISION, 10)))
}
```
Anyway, I follow commiter's decision.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16800#discussion_r100715216
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", 
representation(jobj = "jobj"
 #' @note NaiveBayesModel since 2.0.0
 setClass("NaiveBayesModel", representation(jobj = "jobj"))
 
+#' linear SVM Model
+#'
+#' Fits an linear SVM model against a SparkDataFrame. It is a binary 
classifier, similar to svm in glmnet package
+#' Users can print, make predictions on the produced model and save the 
model to the input path.
+#'
+#' @param data SparkDataFrame for training.
+#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#' @param regParam The regularization parameter.
+#' @param maxIter Maximum iteration number.
+#' @param tol Convergence tolerance of iterations.
+#' @param standardization Whether to standardize the training features 
before fitting the model. The coefficients
+#'of models will be always returned on the 
original scale, so it will be transparent for
+#'users. Note that with/without standardization, 
the models should be always converged
+#'to the same solution when no regularization is 
applied. Default is TRUE, same as glmnet.
+#' @param threshold The threshold in binary classification, in range [0, 
1].
+#' @param weightCol The weight column name.
+#' @param ... additional arguments passed to the method.
--- End diff --

I don't think that would hurt. We have expert params in tree models.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r100715180
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -116,48 +114,66 @@ object TypeCoercion {
* i.e. the main difference with [[findTightestCommonType]] is that here 
we allow some
* loss of precision when widening decimal and double, and promotion to 
string.
*/
-  private def findWiderTypeForTwo(t1: DataType, t2: DataType): 
Option[DataType] = (t1, t2) match {
-case (t1: DecimalType, t2: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(t1, t2))
-case (t: IntegralType, d: DecimalType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (d: DecimalType, t: IntegralType) =>
-  Some(DecimalPrecision.widerDecimalType(DecimalType.forType(t), d))
-case (_: FractionalType, _: DecimalType) | (_: DecimalType, _: 
FractionalType) =>
-  Some(DoubleType)
-case _ =>
-  findTightestCommonTypeToString(t1, t2)
+  def findWiderTypeForTwo(t1: DataType, t2: DataType): Option[DataType] = {
+findTightestCommonType(t1, t2)
--- End diff --

Previously, `findWiderTypeForDecimal ` is before 
`findTightestCommonTypeToString `.  Thus, the results could be different. cc 
@cloud-fan 

You changed the order. I am not sure whether this should be documented in 
the release note.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field

2017-02-12 Thread gczsjdy

Github user gczsjdy commented on the issue:

https://github.com/apache/spark/pull/16476
  
@gatorsmile  Hi, this patch has passed all tests, is there some code I 
still need to modify? Thank you for working on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16906
  
**[Test build #72796 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72796/testReport)**
 for PR 16906 at commit 
[`431bcf8`](https://github.com/apache/spark/commit/431bcf8d332afe9d971b1f44a51e5dd2ca32ff81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16906: [SPARK-19570][PYSPARK] Allow to disable hive in p...

2017-02-12 Thread zjffdu

GitHub user zjffdu opened a pull request:

https://github.com/apache/spark/pull/16906

[SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell

## What changes were proposed in this pull request?

SPARK-15236 do this for scala shell, this ticket is for pyspark shell. This 
is not only for pyspark itself, but can also benefit downstream project like 
livy which use shell.py for its interactive session. For now, livy has no 
control of whether enable hive or not.

## How was this patch tested?

I didn't find a way to add test for it. Just manually test it. 
Run `bin/pyspark --master local --conf 
spark.sql.catalogImplementation=in-memory` and verify hive is not enabled. 




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/spark SPARK-19570

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16906.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16906


commit 431bcf8d332afe9d971b1f44a51e5dd2ca32ff81
Author: Jeff Zhang 
Date:   2017-02-13T02:03:40Z

[SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16776
  
**[Test build #72795 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72795/testReport)**
 for PR 16776 at commit 
[`c77755d`](https://github.com/apache/spark/commit/c77755d0a0ec386d76500eee8fbdb1156382de21).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16868
  
I think @tejasapatil's suggestion is reasonable, because the location is 
specified by users, So the sourceTable.storage.locationUri and 
targetTable.storage.locationUri can be same or different, Whether we need to be 
exactly the same as Hive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16776
  
**[Test build #72794 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72794/testReport)**
 for PR 16776 at commit 
[`a3171e4`](https://github.com/apache/spark/commit/a3171e4065afb26e95f1136f823e59a017a72b19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16733: [SPARK-19392][SQL] Fix the bug that throws an exception ...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16733
  
I think we can close this PR, @maropu ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16868
  
Please add a test case based on what @tejasapatil suggested. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...

2017-02-12 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/16776#discussion_r100713371
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
*   Note that values greater than 1 are accepted but give the same 
result as 1.
* @return the approximate quantiles at the given probabilities
*
-   * @note NaN values will be removed from the numerical column before 
calculation
+   * @note null and NaN values will be removed from the numerical column 
before calculation
*
* @since 2.0.0
*/
   def approxQuantile(
   col: String,
   probabilities: Array[Double],
   relativeError: Double): Array[Double] = {
-StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(),
-  Seq(col), probabilities, relativeError).head.toArray
+val res = approxQuantile(Array(col), probabilities, relativeError)
+if (res != null) {
+  res.head
+} else {
+  null
+}
   }
 
   /**
* Calculates the approximate quantiles of numerical columns of a 
DataFrame.
-   * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* 
approxQuantile]] for
-   * detailed description.
+   * @see `DataFrameStatsFunctions.approxQuantile` for detailed 
description.
--- End diff --

@jkbradley Do you mean `@see 
[[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]])`? I am not 
sure whether it work for java docs. @HyukjinKwon Could you help review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16870: [SPARK-19496][SQL]to_date udf to return null when input ...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16870
  
Could you also add one more case for verifying `to_date` on "2016-02-29" 
and "2017-02-29"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16868: [SPARK-19115] [SQL] Supporting Create External Table Lik...

2017-02-12 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/16868
  
I think there is no need to do this validation, because the location is 
specified by users, So the targetTable.storage.lcaotionUri and 
sourceTable.storage.locationUri can be same or different. @tejasapatil 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16870: [SPARK-19496][SQL]to_date udf to return null when...

2017-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16870#discussion_r100713312
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala ---
@@ -500,6 +527,23 @@ class DateFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(date1.getTime / 1000L), Row(date2.getTime / 1000L)))
 checkAnswer(df.selectExpr(s"to_unix_timestamp(s, '$fmt')"), Seq(
   Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L)))
+
+val x1 = "2015-07-24 10:00:00"
+val x2 = "2015-25-07 02:02:02"
+val x3 = "2015-07-24 25:02:02"
+val x4 = "2015-24-07 26:02:02"
+val ts3 = Timestamp.valueOf("2015-07-24 02:25:02")
+val ts4 = Timestamp.valueOf("2015-07-24 00:10:00")
+
+val df1 = Seq(x1, x2, x3, x4).toDF("x")
+checkAnswer(df1.selectExpr("to_unix_timestamp(x)"), Seq(
+  Row(ts1.getTime / 1000L), Row(null), Row(null), Row(null)))
--- End diff --

The same issue here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 341 matches

Mail list logo