[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/930/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/931/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20626
  
**[Test build #87503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87503/testReport)**
 for PR 20626 at commit 
[`68edf0f`](https://github.com/apache/spark/commit/68edf0f3463daed3bb7042becb333788b22b23b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-16 Thread Fokko
Github user Fokko commented on the issue:

https://github.com/apache/spark/pull/20057
  
Hi @gatorsmile, thanks for putting it to the tests. The main reason why I 
personally dislike Sqoop is:

- **Legacy.** The old map-reduce should be buried in the upcoming years. As 
a data engineering consultant, I see more people questioning the whole Hadoop 
stack. Using Sqoop you still need to run map-reduce tasks, and this isn't easy 
on other platforms like kubernetes.
- **Stability.** I see Sqoop jobs fail quite often, and there isn't a nice 
way of retrying this in an atomic way. For example, when having a Sqoop job on 
Airflow, we cannot simply retry the operation. We when we import data from a 
rmdbs to hdfs, we have to make sure that the target directory of the previous 
run has been deleted.

This is also where Spark-jdbc comes in, for example, in the future we would 
like to delete single partitions, but this is wip. Maybe @danielvdende can 
elaborate a bit on their use-case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-16 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Hmm, not it fails the OrcQuerySuite. This PR doesn't touch any of the Orc 
implementation in Spark. Could this be a flaky test @gatorsmile ?
```org.scalatest.exceptions.TestFailedDueToTimeoutException: The code 
passed to eventually never returned normally. Attempted 12 times over 
10.15875468798 seconds. Last failure message: There are 1 possibly leaked 
file streams..```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/932/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20625
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20625
  
**[Test build #87504 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87504/testReport)**
 for PR 20625 at commit 
[`4e5708c`](https://github.com/apache/spark/commit/4e5708ca01f048f2408ded0b039ae724b806977c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20625
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87502/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20625
  
**[Test build #87502 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87502/testReport)**
 for PR 20625 at commit 
[`c79c6df`](https://github.com/apache/spark/commit/c79c6df7284b9717fe4e4c26090dcb51bf7712da).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87503/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20626
  
**[Test build #87503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87503/testReport)**
 for PR 20626 at commit 
[`68edf0f`](https://github.com/apache/spark/commit/68edf0f3463daed3bb7042becb333788b22b23b0).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20625
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-16 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/20057
  
Hi guys, @Fokko @gatorsmile, completely agree with what @Fokko mentioned, 
our main reason for wanting to get away from Sqoop is also for stability 
reasons and to get rid of MapReduce in preparation for our move to Kubernetes 
(or something similar). We've also seen it to be much faster than Sqoop. In 
terms of why we need the feature in this PR: we have some tables in PostgreSQL 
that have foreign keys linking them. We have also specified a schema for these 
tables. If we use the drop-and-recreate option, Spark will determine the 
schema, overriding our PostgreSQL schema. Obviously, these should match up, but 
I personally don't like that Spark can do this (and that you can't explicitly 
tell it not to). 

Because of this behaviour, we currently require 2 tasks in Airflow (as 
@Fokko mentioned) to ensure the tables are truncated, but the schema stays in 
place. This PR would enable us to specify in a single, idempotent (Airflow) 
task that we want to truncate the table before putting new data in there. The 
cascade enables us to not break foreign key relations and cause errors.

To be clear, this therefore isn't emulating a Sqoop feature (as a Sqoop 
task isn't idempotent), but is in fact improving on what Sqoop offers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20625
  
**[Test build #87504 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87504/testReport)**
 for PR 20625 at commit 
[`4e5708c`](https://github.com/apache/spark/commit/4e5708ca01f048f2408ded0b039ae724b806977c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87504/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20625
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...

2018-02-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20621#discussion_r168697351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -407,6 +407,29 @@ object PartitioningUtils {
   Literal(bigDecimal)
 }
 
+val dateTry = Try {
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // DateType
+  DateTimeUtils.getThreadLocalDateFormat.parse(raw)
--- End diff --

actually all the `DateFormat`'s `parse` allow extra-characters after a 
valid date: 
(https://docs.oracle.com/javase/7/docs/api/java/text/DateFormat.html#parse(java.lang.String)).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...

2018-02-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20621#discussion_r168697699
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -407,6 +407,29 @@ object PartitioningUtils {
   Literal(bigDecimal)
 }
 
+val dateTry = Try {
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // DateType
+  DateTimeUtils.getThreadLocalDateFormat.parse(raw)
+  // SPARK-23436: Casting the string to date may still return null if 
a bad Date is provided.
+  // We need to check that we can cast the raw string since we later 
can use Cast to get
+  // the partition values with the right DataType (see
+  // 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning)
+  val dateOption = Option(Cast(Literal(raw), DateType).eval())
--- End diff --

sure, aren't these comments enough? may you please provide some suggestions 
about how you would like to improve them, ie. what is it missing/not clear? 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20568
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20568#discussion_r168698344
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -218,4 +221,32 @@ object FeatureHasher extends 
DefaultParamsReadable[FeatureHasher] {
 
   @Since("2.3.0")
   override def load(path: String): FeatureHasher = super.load(path)
+
+  private val seed = OldHashingTF.seed
+
+  /**
+   * Calculate a hash code value for the term object using
+   * Austin Appleby's MurmurHash 3 algorithm (MurmurHash3_x86_32).
+   * This is the default hash algorithm used from Spark 2.0 onwards.
+   * Use hashUnsafeBytes2 to match the original algorithm with the value.
+   * See SPARK-23381.
+   */
+  @Since("2.3.0")
+  def murmur3Hash(term: Any): Int = {
--- End diff --

Maybe `private[feature]`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-02-16 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20464
  
@felixcheung Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20626
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20626
  
So I was able to find quite a few cases where the `DUMMY` placeholder 
caught uses of the `value` field outside of appropriate null-checked regions. 
I'll check the individual cases and then update this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/933/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20626
  
**[Test build #87505 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87505/testReport)**
 for PR 20626 at commit 
[`68edf0f`](https://github.com/apache/spark/commit/68edf0f3463daed3bb7042becb333788b22b23b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20621: [SPARK-23436][SQL] Infer partition as Date only i...

2018-02-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20621#discussion_r168700662
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -407,6 +407,29 @@ object PartitioningUtils {
   Literal(bigDecimal)
 }
 
+val dateTry = Try {
+  // try and parse the date, if no exception occurs this is a 
candidate to be resolved as
+  // DateType
+  DateTimeUtils.getThreadLocalDateFormat.parse(raw)
+  // SPARK-23436: Casting the string to date may still return null if 
a bad Date is provided.
+  // We need to check that we can cast the raw string since we later 
can use Cast to get
+  // the partition values with the right DataType (see
+  // 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning)
+  val dateOption = Option(Cast(Literal(raw), DateType).eval())
--- End diff --

I mean .. simply like:

```
// Disallow date type if the cast returned null blah blah
require(dateOption.isDefine)
```

nothing special. I am fine with not adding it too.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20626
  
cc @cloud-fan @hvanhovell 
Note: this is for master and branch-2.3 post 2.3.0 release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/934/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20626
  
**[Test build #87506 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87506/testReport)**
 for PR 20626 at commit 
[`d709e24`](https://github.com/apache/spark/commit/d709e246d99c0d821238afda1b203b9880eb1ed1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20627: [SPARK-23217][ML][PYTHON] Add distanceMeasure par...

2018-02-16 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/20627

[SPARK-23217][ML][PYTHON] Add distanceMeasure param to ClusteringEvaluator 
Python API

## What changes were proposed in this pull request?

The PR adds the `distanceMeasure` param to ClusteringEvaluator in the 
Python API. This allows the user to specify `cosine` as distance measure in 
addition to the default `squaredEuclidean`.

## How was this patch tested?

added UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23217_python

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20627


commit 8fe8efaaf0202f804e80b36ec11b43d5aa34d511
Author: Marco Gaido 
Date:   2018-02-16T09:24:45Z

[SPARK-23217][ML][PYTHON] Add distanceMeasure param to ClusteringEvaluator 
Python API




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20627: [SPARK-23217][ML][PYTHON] Add distanceMeasure param to C...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20627
  
**[Test build #87507 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87507/testReport)**
 for PR 20627 at commit 
[`8fe8efa`](https://github.com/apache/spark/commit/8fe8efaaf0202f804e80b36ec11b43d5aa34d511).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20627: [SPARK-23217][ML][PYTHON] Add distanceMeasure param to C...

2018-02-16 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20627
  
cc @srowen @BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20626
  
Ah...I see, there are more places where they're statically referencing some 
variable but dynamically those variables would always be null. I'll update the 
PR later to fix those places as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20626
  
**[Test build #87505 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87505/testReport)**
 for PR 20626 at commit 
[`68edf0f`](https://github.com/apache/spark/commit/68edf0f3463daed3bb7042becb333788b22b23b0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87505/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20627: [SPARK-23217][ML][PYTHON] Add distanceMeasure param to C...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20627
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/935/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20627: [SPARK-23217][ML][PYTHON] Add distanceMeasure param to C...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20627
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/936/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20621
  
**[Test build #87508 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87508/testReport)**
 for PR 20621 at commit 
[`6274537`](https://github.com/apache/spark/commit/6274537139b2282ac5f9ded605037f63c7bee2f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20625: [SPARK-23446][PYTHON] Explicitly check supported ...

2018-02-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20625#discussion_r168709505
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2000,10 +2001,12 @@ def toPandas(self):
 return _check_dataframe_localize_timestamps(pdf, 
timezone)
 else:
 return pd.DataFrame.from_records([], 
columns=self.columns)
-except ImportError as e:
-msg = "note: pyarrow must be installed and available on 
calling Python process " \
-  "if using spark.sql.execution.arrow.enabled=true"
-raise ImportError("%s\n%s" % (_exception_message(e), msg))
+except Exception as e:
+msg = (
+"Note: toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to true. 
Please set it to false "
+"to disable this.")
--- End diff --

hmm, this says why it's trying arrow and how to turn it off, but doesn't 
say why I have to turn it off? perhaps say something like pyarrow is not found 
(if it is the cause if we know)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20619: [SPARK-23390][SQL] Register task completion liste...

2018-02-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20619#discussion_r168709918
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -395,16 +395,19 @@ class ParquetFileFormat
 
ParquetInputFormat.setFilterPredicate(hadoopAttemptContext.getConfiguration, 
pushed.get)
   }
   val taskContext = Option(TaskContext.get())
-  val parquetReader = if (enableVectorizedReader) {
+  val iter = if (enableVectorizedReader) {
 val vectorizedReader = new VectorizedParquetRecordReader(
   convertTz.orNull, enableOffHeapColumnVector && 
taskContext.isDefined, capacity)
+val recordReaderIterator = new 
RecordReaderIterator(vectorizedReader)
+// Register a task completion lister before `initalization`.
--- End diff --

could `new VectorizedParquetRecordReader` or `new RecordReaderIterator` 
fail?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20568
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20568
  
**[Test build #87509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87509/testReport)**
 for PR 20568 at commit 
[`c20cd97`](https://github.com/apache/spark/commit/c20cd97d7ce5690993b4490bb7cca955e7703d90).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20619: [SPARK-23390][SQL] Register task completion liste...

2018-02-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20619#discussion_r168711619
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -395,16 +395,19 @@ class ParquetFileFormat
 
ParquetInputFormat.setFilterPredicate(hadoopAttemptContext.getConfiguration, 
pushed.get)
   }
   val taskContext = Option(TaskContext.get())
-  val parquetReader = if (enableVectorizedReader) {
+  val iter = if (enableVectorizedReader) {
 val vectorizedReader = new VectorizedParquetRecordReader(
   convertTz.orNull, enableOffHeapColumnVector && 
taskContext.isDefined, capacity)
+val recordReaderIterator = new 
RecordReaderIterator(vectorizedReader)
+// Register a task completion lister before `initalization`.
--- End diff --

Those constructors didn't look heavy to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20627: [SPARK-23217][ML][PYTHON] Add distanceMeasure param to C...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20627
  
**[Test build #87507 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87507/testReport)**
 for PR 20627 at commit 
[`8fe8efa`](https://github.com/apache/spark/commit/8fe8efaaf0202f804e80b36ec11b43d5aa34d511).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20627: [SPARK-23217][ML][PYTHON] Add distanceMeasure param to C...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20627
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20627: [SPARK-23217][ML][PYTHON] Add distanceMeasure param to C...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20627
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87507/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20619: [SPARK-23390][SQL] Register task completion liste...

2018-02-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20619#discussion_r168714722
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -395,16 +395,19 @@ class ParquetFileFormat
 
ParquetInputFormat.setFilterPredicate(hadoopAttemptContext.getConfiguration, 
pushed.get)
   }
   val taskContext = Option(TaskContext.get())
-  val parquetReader = if (enableVectorizedReader) {
+  val iter = if (enableVectorizedReader) {
 val vectorizedReader = new VectorizedParquetRecordReader(
   convertTz.orNull, enableOffHeapColumnVector && 
taskContext.isDefined, capacity)
+val recordReaderIterator = new 
RecordReaderIterator(vectorizedReader)
+// Register a task completion lister before `initalization`.
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20625: [SPARK-23446][PYTHON] Explicitly check supported ...

2018-02-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20625#discussion_r168718112
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2000,10 +2001,12 @@ def toPandas(self):
 return _check_dataframe_localize_timestamps(pdf, 
timezone)
 else:
 return pd.DataFrame.from_records([], 
columns=self.columns)
-except ImportError as e:
-msg = "note: pyarrow must be installed and available on 
calling Python process " \
-  "if using spark.sql.execution.arrow.enabled=true"
-raise ImportError("%s\n%s" % (_exception_message(e), msg))
+except Exception as e:
+msg = (
+"Note: toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to true. 
Please set it to false "
+"to disable this.")
--- End diff --

Oh, that should be part of the original message. For example, I don't have 
PyArrow in `pypy` in my local. it shows the error like:

```
RuntimeError: PyArrow >= 0.8.0 must be installed; however, it was not found.
Note: toPandas attempted Arrow optimization because 
'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to 
disable this.
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20626
  
**[Test build #87506 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87506/testReport)**
 for PR 20626 at commit 
[`d709e24`](https://github.com/apache/spark/commit/d709e246d99c0d821238afda1b203b9880eb1ed1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87506/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20626
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20626: [SPARK-23447][SQL] Cleanup codegen template for Literal

2018-02-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20626
  
You are going to need to 'type' null values for this work, I think casting 
would be enough.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20621
  
**[Test build #87508 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87508/testReport)**
 for PR 20621 at commit 
[`6274537`](https://github.com/apache/spark/commit/6274537139b2282ac5f9ded605037f63c7bee2f9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87508/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20568
  
**[Test build #87509 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87509/testReport)**
 for PR 20568 at commit 
[`c20cd97`](https://github.com/apache/spark/commit/c20cd97d7ce5690993b4490bb7cca955e7703d90).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87509/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20568
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20628: Preserve extraJavaOptions ordering

2018-02-16 Thread andrusha
GitHub user andrusha opened a pull request:

https://github.com/apache/spark/pull/20628

Preserve extraJavaOptions ordering

For some JVM options, like `-XX:+UnlockExperimentalVMOptions` ordering is 
necessary.

## What changes were proposed in this pull request?

Keep original extraJavaOptions ordering, when passing them through 
environment variables inside the Docker container.

## How was this patch tested?

Ran base branch a couple of times and checked startup command in logs. 
Ordering differed every time. Added sorting, ordering was consistent to what 
user had in `extraJavaOptions`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrusha/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20628


commit 6759e9e9f9075427b87fe5071e803c60d7521629
Author: Andrew Korzhuev 
Date:   2018-02-16T14:24:48Z

Preserve extraJavaOptions ordering

For some JVM options, like `-XX:+UnlockExperimentalVMOptions` ordering is 
necessary.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20628: [SPARK-23449][K8S] Preserve extraJavaOptions ordering

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20628
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20628: [SPARK-23449][K8S] Preserve extraJavaOptions ordering

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20628
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...

2018-02-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20619
  
can we provide a manual test like the OOM one in your ORC PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-16 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/20629

[SPARK-23451][ML] Deprecate KMeans.computeCost

## What changes were proposed in this pull request?

Deprecate `KMeans.computeCost` which was introduced as a temp fix and now 
it is not needed anymore, since we introduced `ClusteringEvaluator`.

## How was this patch tested?

manual test (deprecation warning displayed)
Scala
```
...
scala> model.computeCost(dataset)
warning: there was one deprecation warning; re-run with -deprecation for 
details
res1: Double = 0.0
```

Python
```
>>> import warnings
>>> warnings.simplefilter('always', DeprecationWarning)
...
>>> model.computeCost(df)
/Users/mgaido/apache/spark/python/pyspark/ml/clustering.py:330: 
DeprecationWarning: Deprecated in 2.4.0. It will be removed in 3.0.0. Use 
ClusteringEvaluator instead.
  " instead.", DeprecationWarning)
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23451

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20629


commit 2f79bb2d5c7e29e85a4a7abe63254d392a49fe53
Author: Marco Gaido 
Date:   2018-02-16T16:03:09Z

[SPARK-23451][ML] Deprecate KMeans.computeCost




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20629
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20629
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/937/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20629
  
**[Test build #87510 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87510/testReport)**
 for PR 20629 at commit 
[`2f79bb2`](https://github.com/apache/spark/commit/2f79bb2d5c7e29e85a4a7abe63254d392a49fe53).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20057
  
Our overwrite semantics is confusing to most. We need to correct it in the 
next release, i.e., Spark 2.4.

Even if we try our best to keep the schema of the original table, the 
actual CREATE TABLE statements still take many vendor-specific info. It is hard 
for us to rebuild all of them. I can understand your use case for truncate. 

I am sorry this will not be part of Spark 2.3 release. We will include it 
in the next release. You can still do the change in your forked Spark. 

Just feel free to let us know if you find anything that we should do in 
Spark SQL JCBC to match the corresponding ones in SQOOP. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20057
  
This test is a flaky test. Your changes did not fail any test case. I will 
review your PR after the 2.3 release. Thanks again!

cc @dongjoon-hyun Do you want to take a look at this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20511#discussion_r168817045
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
   }
 }
   }
+
+  test("SPARK-23340 Empty float/double array columns raise EOFException") {
+Seq(Seq(Array.empty[Float]).toDF(), 
Seq(Array.empty[Double]).toDF()).foreach { df =>
+  withTempPath { path =>
--- End diff --

We have three ORC readers, right? We need to check all of them, and also 
vectorized reader too, even if they do not support it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20621
  
This is a blocker-level regression.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-16 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
@cloud-fan, is there anything else that needs to be updated, or is this 
ready to be merged?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20621
  
It sounds like Spark 2.2 already has this bug. This causes an incorrect 
result.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20567
  
Thanks! Happy Lunar New Year!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20625
  
Thanks for the fast fix! We need to merge it to SPARK-2.3.0 before RC4. 
Will merge it now. We can improve the fix later if anybody has better ideas. 

Thanks! Merged to master/2.3

Happy Lunar New Year!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...

2018-02-16 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20424
  
still lgtm, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20611
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20625: [SPARK-23446][PYTHON] Explicitly check supported ...

2018-02-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20625


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20625: [SPARK-23446][PYTHON] Explicitly check supported ...

2018-02-16 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/20625#discussion_r168823994
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2000,10 +2001,12 @@ def toPandas(self):
 return _check_dataframe_localize_timestamps(pdf, 
timezone)
 else:
 return pd.DataFrame.from_records([], 
columns=self.columns)
-except ImportError as e:
-msg = "note: pyarrow must be installed and available on 
calling Python process " \
-  "if using spark.sql.execution.arrow.enabled=true"
-raise ImportError("%s\n%s" % (_exception_message(e), msg))
+except Exception as e:
+msg = (
+"Note: toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to true. 
Please set it to false "
+"to disable this.")
+raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
--- End diff --

Should the same type of error be raised instead of `RuntimeError`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20611
  
**[Test build #87511 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87511/testReport)**
 for PR 20611 at commit 
[`af17f65`](https://github.com/apache/spark/commit/af17f65d2d60b69fe0c4addff5299153d4af37c0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20568
  
@mrkm4ntr this is legitimate failure. Can you fix the python tests? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20621: [SPARK-23436][SQL] Infer partition as Date only if it ca...

2018-02-16 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20621
  
@gatorsmile thanks for checking. Yes, Spark 2.2 is affected too, so I am 
not sure whether this should be considered a blocker regression. But, I think 
we should fix it as soon as possible, nonetheless.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20611
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20611
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87511/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20611
  
**[Test build #87511 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87511/testReport)**
 for PR 20611 at commit 
[`af17f65`](https://github.com/apache/spark/commit/af17f65d2d60b69fe0c4addff5299153d4af37c0).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20625: [SPARK-23446][PYTHON] Explicitly check supported ...

2018-02-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20625#discussion_r168825608
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2000,10 +2001,12 @@ def toPandas(self):
 return _check_dataframe_localize_timestamps(pdf, 
timezone)
 else:
 return pd.DataFrame.from_records([], 
columns=self.columns)
-except ImportError as e:
-msg = "note: pyarrow must be installed and available on 
calling Python process " \
-  "if using spark.sql.execution.arrow.enabled=true"
-raise ImportError("%s\n%s" % (_exception_message(e), msg))
+except Exception as e:
+msg = (
+"Note: toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to true. 
Please set it to false "
+"to disable this.")
+raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
--- End diff --

Yup, please open a PR if you have a better idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20625
  
This was my best for a small and safe fix as possible as I could. Thanks 
for mering it @gatorsmile sincirely. This was my last concern about PyArrow abd 
Pandas.

I don't mind at all if anyone opens another PR with a better idea to be 
clear.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20625
  
I think `RuntimeError` is fine for now and we can improve this later with 
logic to fallback too - best not to try and get too clever so close to the 
release :)  Thanks for catching this and the quick fix @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20625: [SPARK-23446][PYTHON] Explicitly check supported types i...

2018-02-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20625
  
Thank you @BryanCutler!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20629
  
**[Test build #87510 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87510/testReport)**
 for PR 20629 at commit 
[`2f79bb2`](https://github.com/apache/spark/commit/2f79bb2d5c7e29e85a4a7abe63254d392a49fe53).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20629
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87510/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20629
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...

2018-02-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20424
  
Thanks @squito. Will merge this one in few days.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >