[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18339
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83998/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18339
  
**[Test build #83998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83998/testReport)**
 for PR 18339 at commit 
[`3ece21f`](https://github.com/apache/spark/commit/3ece21f5fd99e12a34616ffe90e34025ea3e3ee7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18339
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19774#discussion_r151855965
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -689,6 +689,11 @@ case class DescribeColumnCommand(
   buffer += Row("distinct_count", 
cs.map(_.distinctCount.toString).getOrElse("NULL"))
   buffer += Row("avg_col_len", 
cs.map(_.avgLen.toString).getOrElse("NULL"))
   buffer += Row("max_col_len", 
cs.map(_.maxLen.toString).getOrElse("NULL"))
+  buffer ++= cs.flatMap(_.histogram.map { hist =>
+val header = Row("histogram", s"height: ${hist.height}, 
num_of_bins: ${hist.bins.length}")
+Seq(header) ++ hist.bins.map(bin =>
+  Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, 
distinct_count: ${bin.ndv}"))
+  }).getOrElse(Seq(Row("histogram", "NULL")))
--- End diff --

Some comments or cleanup here would be nicer though


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83997/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #83997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83997/testReport)**
 for PR 19498 at commit 
[`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83996/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #83996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83996/testReport)**
 for PR 19498 at commit 
[`dc8446a`](https://github.com/apache/spark/commit/dc8446ad99b2ad315ee93f854d98e3c25aa42ccf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19715
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83994/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19715
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19715
  
**[Test build #83994 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83994/testReport)**
 for PR 19715 at commit 
[`5038e21`](https://github.com/apache/spark/commit/5038e21e9f3d0c80f71308f2fc9167e4a7749e82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-11-18 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19683#discussion_r151855849
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala ---
@@ -59,15 +61,23 @@ case class GenerateExec(
 generator: Generator,
 join: Boolean,
 outer: Boolean,
+omitGeneratorChild: Boolean,
 generatorOutput: Seq[Attribute],
 child: SparkPlan)
   extends UnaryExecNode with CodegenSupport {
 
+  private def projectedChildOutput = generator match {
+case g: UnaryExpression if omitGeneratorChild =>
+  (child.output diff Seq(g.child))
+case _ =>
+  child.output
+  }
+
   override def output: Seq[Attribute] = {
 if (join) {
-  child.output ++ generatorOutput
-} else {
-  generatorOutput
+  projectedChildOutput ++ generatorOutput
+  } else {
--- End diff --

nit: do we need update indentation?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
Will take a final look tomorrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
it seems okay without a close look. Let me take the close look if I can 
take the look first soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18339
  
**[Test build #83998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83998/testReport)**
 for PR 18339 at commit 
[`3ece21f`](https://github.com/apache/spark/commit/3ece21f5fd99e12a34616ffe90e34025ea3e3ee7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18339
  
I am okay with going ahead @holdenk if you think it's okay anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18339
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #83997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83997/testReport)**
 for PR 19498 at commit 
[`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #83996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83996/testReport)**
 for PR 19498 at commit 
[`dc8446a`](https://github.com/apache/spark/commit/dc8446ad99b2ad315ee93f854d98e3c25aa42ccf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83995/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #83995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83995/testReport)**
 for PR 19498 at commit 
[`c45b701`](https://github.com/apache/spark/commit/c45b7016d8c446d023a3ca415c15d26298e61c5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19752: [SPARK-22520][SQL] Support code generation for large Cas...

2017-11-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19752
  
Sure, it may have some overlaps with #18641. I will review this after 
#18641 due to avoiding a conflict.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #83995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83995/testReport)**
 for PR 19498 at commit 
[`c45b701`](https://github.com/apache/spark/commit/c45b7016d8c446d023a3ca415c15d26298e61c5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to av...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19498#discussion_r151855253
  
--- Diff: python/pyspark/streaming/util.py ---
@@ -64,7 +64,11 @@ def call(self, milliseconds, jrdds):
 t = datetime.fromtimestamp(milliseconds / 1000.0)
 r = self.func(t, *rdds)
 if r:
-return r._jrdd
+# Here, we work around to ensure `_jrdd` is `JavaRDD` by 
wrapping it by `map`.
+# 
org.apache.spark.streaming.api.python.PythonTransformFunction requires to return
+# `JavaRDD`; however, this could be `JavaPairRDD` by some 
APIs, for example, `zip`.
+# See SPARK-17756.
+return r.map(lambda x: x)._jrdd
--- End diff --

Thanks for review @holdenk. Let me push the change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19715
  
**[Test build #83994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83994/testReport)**
 for PR 19715 at commit 
[`5038e21`](https://github.com/apache/spark/commit/5038e21e9f3d0c80f71308f2fc9167e4a7749e82).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19782
  
**[Test build #83993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83993/testReport)**
 for PR 19782 at commit 
[`f41698e`](https://github.com/apache/spark/commit/f41698e330c517830a90309a022b072ea6406dcb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19782
  
cc @ueshin, could you take a look please when you have some time?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19782: [SPARK-22554][PYTHON] Add a config to control if PySpark...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19782
  
This is also partly for running Python coverage without extra code change. 
I know a hacky way to run this (see 
https://github.com/apache/spark/pull/19630#issuecomment-345490662 and 
https://github.com/apache/spark/pull/19630#issuecomment-345171997):

Now, we can do, for example, as below:

```
pip install coverage
# Build Spark (http://spark.apache.org/docs/latest/building-spark.html)
rm python/lib/pyspark.zip
rm -fr .coverage
rm -fr coverage_html

echo "spark.python.use.daemon false" >> conf/spark-defaults.conf

echo "
#!/usr/bin/env bash
coverage run -p \$@
" > coverage_python
chmod 755 coverage_python

# Run actual Python tests
PATH=`pwd`:$PATH PYSPARK_PYTHON=coverage_python SPARK_TESTING=1 bin/pyspark 
pyspark.sql.tests VectorizedUDFTests 

rm conf/spark-defaults.conf

coverage combine
coverage html -d coverage_html -i
open coverage_html
# Open up index.html in your browser.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19782: [SPARK-22554][PYTHON] Add a config to control if ...

2017-11-18 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/19782

[SPARK-22554][PYTHON] Add a config to control if PySpark should use daemon 
or not

## What changes were proposed in this pull request?

This PR proposes to add a flag to control if PySpark should use daemon or 
not. 

Actually, SparkR already has a flag for useDaemon:

https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362

It'd be great if we have this flag too. It makes easier to test Windows 
specific issue.

## How was this patch tested?

Manually tested.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark use-daemon-flag

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19782.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19782


commit f41698e330c517830a90309a022b072ea6406dcb
Author: hyukjinkwon 
Date:   2017-11-19T05:10:19Z

Add a config to control if PySpark should use daemon or not




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-11-18 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/19525#discussion_r151854676
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
 @Since("2.0.0")
 object DenseMatrix {
 
+  @Since("2.3.0")
+  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, 
Array[Double], Boolean)] =
--- End diff --

@yanboliang any suggestion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19630: [SPARK-22409] Introduce function type argument in pandas...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19630
  
Actually, R has a flag for `useDaemon`:


https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L362

It'd be great if we have this flag too. It makes easier to test Windows 
specific issue too .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19630: [SPARK-22409] Introduce function type argument in pandas...

2017-11-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19630
  
OK, mine was, with this diff:

```diff
--- 
a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
+++ 
b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
@@ -38,7 +38,7 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   // (pyspark/daemon.py) and tell it to fork new workers for our tasks. 
This daemon currently
   // only works on UNIX-based systems now because it uses signals for 
child management, so we can
   // also fall back to launching workers (pyspark/worker.py) directly.
-  val useDaemon = !System.getProperty("os.name").startsWith("Windows")
+  val useDaemon = false

   var daemon: Process = null
   val daemonHost = InetAddress.getByAddress(Array(127, 0, 0, 1))
```

```bash
pip install coverage
# Build Spark (http://spark.apache.org/docs/latest/building-spark.html)
rm python/lib/pyspark.zip
rm -fr .coverage
rm -fr coverage_html
echo "
#!/usr/bin/env bash
coverage run -p \$@
" > coverage_python
chmod 755 coverage_python
PATH=`pwd`:$PATH PYSPARK_PYTHON=coverage_python SPARK_TESTING=1 bin/pyspark 
pyspark.sql.tests VectorizedUDFTests
coverage combine
coverage html -d coverage_html -i
open coverage_html
# Open up index.html in your browser.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19627
  
**[Test build #83992 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83992/testReport)**
 for PR 19627 at commit 
[`ae082f5`](https://github.com/apache/spark/commit/ae082f564ff2c23c976201ccf91a7dcd6726e4c9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CrossValidator(Estimator, ValidatorParams, HasParallelism, 
HasCollectSubModels,`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19627
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83992/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19627
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19781
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83990/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19781
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19781
  
**[Test build #83990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83990/testReport)**
 for PR 19781 at commit 
[`9797041`](https://github.com/apache/spark/commit/9797041aa9138386f26d1f6c259da302f918ab5d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17436
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83989/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17436
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #83989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83989/testReport)**
 for PR 17436 at commit 
[`b2c5b2e`](https://github.com/apache/spark/commit/b2c5b2ef0a36a2cc4085856970ddad490e526924).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19627
  
**[Test build #83992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83992/testReport)**
 for PR 19627 at commit 
[`ae082f5`](https://github.com/apache/spark/commit/ae082f564ff2c23c976201ccf91a7dcd6726e4c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19627
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83991/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19627
  
**[Test build #83991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83991/testReport)**
 for PR 19627 at commit 
[`758bc24`](https://github.com/apache/spark/commit/758bc24e8328e8b496b28c7cd8b8183458f18953).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19627
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19627
  
@holdenk Find the reason. There is an empty file in the directory. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19627
  
**[Test build #83991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83991/testReport)**
 for PR 19627 at commit 
[`758bc24`](https://github.com/apache/spark/commit/758bc24e8328e8b496b28c7cd8b8183458f18953).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r151852231
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,52 +64,22 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-// place generated code of condition, true value and false value in 
separate methods if
-// their code combined is large
-val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
--- End diff --

Actually I think this removed part is orthogonal to what this PR did. Even 
condition, true, and false expressions are not more than threshold 
individually, their combination is still more than the threshold.

This PR deals the oversize gen'd codes in deeply nested expressions, not 
oversize combination of codes from the children.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-11-18 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19082
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19781
  
**[Test build #83990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83990/testReport)**
 for PR 19781 at commit 
[`9797041`](https://github.com/apache/spark/commit/9797041aa9138386f26d1f6c259da302f918ab5d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's needCop...

2017-11-18 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19781
  
@cloud-fan WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19781: [SPARK-22445][SQL][FOLLOW-UP] Respect children's ...

2017-11-18 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/19781

[SPARK-22445][SQL][FOLLOW-UP] Respect children's needCopyResult in Sort, 
HashAggregate, and  BroadcastHashJoin

## What changes were proposed in this pull request?
I found #19656 causes some bugs, for example, it changed the result set of 
`q6` in tpcds:
- w/o pr19658
```
+-+---+
|state|cnt|
+-+---+
|   MA| 10|
|   AK| 10|
|   AZ| 11|
|   ME| 13|
|   VT| 14|
|   NV| 15|
|   NH| 16|
|   UT| 17|
|   NJ| 21|
|   MD| 22|
|   WY| 25|
|   NM| 26|
|   OR| 31|
|   WA| 36|
|   ND| 38|
|   ID| 39|
|   SC| 45|
|   WV| 50|
|   FL| 51|
|   OK| 53|
|   MT| 53|
|   CO| 57|
|   AR| 58|
|   NY| 58|
|   PA| 62|
|   AL| 63|
|   LA| 63|
|   SD| 70|
|   WI| 80|
| null| 81|
|   MI| 82|
|   NC| 82|
|   MS| 83|
|   CA| 84|
|   MN| 85|
|   MO| 88|
|   IL| 95|
|   IA|102|
|   TN|102|
|   IN|103|
|   KY|104|
|   NE|113|
|   OH|114|
|   VA|130|
|   KS|139|
|   GA|168|
|   TX|216|
+-+---+
```
- w/   pr19658
```
+-+---+
|state|cnt|
+-+---+
|   RI| 14|
|   AK| 16|
|   FL| 20|
|   NJ| 21|
|   NM| 21|
|   NV| 22|
|   MA| 22|
|   MD| 22|
|   UT| 22|
|   AZ| 25|
|   SC| 28|
|   AL| 36|
|   MT| 36|
|   WA| 39|
|   ND| 41|
|   MI| 44|
|   AR| 45|
|   OR| 47|
|   OK| 52|
|   PA| 53|
|   LA| 55|
|   CO| 55|
|   NY| 64|
|   WV| 66|
|   SD| 72|
|   MS| 73|
|   NC| 79|
|   IN| 82|
| null| 85|
|   ID| 88|
|   MN| 91|
|   WI| 95|
|   IL| 96|
|   MO| 97|
|   CA|109|
|   CA|109|
|   TN|114|
|   NE|115|
|   KY|128|
|   OH|131|
|   IA|156|
|   TX|160|
|   VA|182|
|   KS|211|
|   GA|230|
+-+---+
```
This pr is to keep the original logic of `CodegenContext.copyResult` in 
some plans.

## How was this patch tested?
Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-22445-bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19781.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19781


commit 9797041aa9138386f26d1f6c259da302f918ab5d
Author: Takeshi Yamamuro 
Date:   2017-11-19T00:12:46Z

bugfix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19767: [SPARK-22543][SQL] fix java 64kb compile error for deepl...

2017-11-18 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19767
  
should this go to 2.2?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #83989 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83989/testReport)**
 for PR 17436 at commit 
[`b2c5b2e`](https://github.com/apache/spark/commit/b2c5b2ef0a36a2cc4085856970ddad490e526924).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19728


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19728: [SPARK-22498][SQL] Fix 64KB JVM bytecode limit problem w...

2017-11-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19728
  
thanks, merging to master/2.2!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19780
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19780
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83988/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19780
  
**[Test build #83988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83988/testReport)**
 for PR 19780 at commit 
[`dc49b6e`](https://github.com/apache/spark/commit/dc49b6e1c884bce164e08bb3f63cbdec86541c75).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19780
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19780
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83987/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19780
  
**[Test build #83987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83987/testReport)**
 for PR 19780 at commit 
[`e08259a`](https://github.com/apache/spark/commit/e08259a41d0f39c751858daba713b30e52a6c3a4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19767: [SPARK-22543][SQL] fix java 64kb compile error fo...

2017-11-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19767#discussion_r151842921
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -105,6 +105,41 @@ abstract class Expression extends TreeNode[Expression] 
{
   val isNull = ctx.freshName("isNull")
   val value = ctx.freshName("value")
   val ve = doGenCode(ctx, ExprCode("", isNull, value))
+
+  // TODO: support whole stage codegen too
+  if (ve.code.trim.length > 1024 && ctx.INPUT_ROW != null && 
ctx.currentVars == null) {
+val setIsNull = if (ve.isNull != "false" && ve.isNull != "true") {
+  val globalIsNull = ctx.freshName("globalIsNull")
+  ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull = 
false;")
+  val localIsNull = ve.isNull
+  ve.isNull = globalIsNull
+  s"$globalIsNull = $localIsNull;"
+} else {
+  ""
+}
+
+val setValue = {
+  val globalValue = ctx.freshName("globalValue")
+  ctx.addMutableState(
+ctx.javaType(dataType), globalValue, s"$globalValue = 
${ctx.defaultValue(dataType)};")
+  val localValue = ve.value
+  ve.value = globalValue
+  s"$globalValue = $localValue;"
+}
+
+val funcName = ctx.freshName(nodeName)
+val funcFullName = ctx.addNewFunction(funcName,
+  s"""
+ |private void $funcName(InternalRow ${ctx.INPUT_ROW}) {
+ |  ${ve.code.trim}
+ |  $setValue
--- End diff --

creating objects will be a big overhead. I think having a global boolean 
variable is better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17972: [SPARK-20723][ML]Add intermediate storage level to tree ...

2017-11-18 Thread phatak-dev
Github user phatak-dev commented on the issue:

https://github.com/apache/spark/pull/17972
  
@WeichenXu123 resolved merge conflicts. Can you initiate jenkins build?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19780
  
**[Test build #83988 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83988/testReport)**
 for PR 19780 at commit 
[`dc49b6e`](https://github.com/apache/spark/commit/dc49b6e1c884bce164e08bb3f63cbdec86541c75).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb compile er...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19780
  
**[Test build #83987 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83987/testReport)**
 for PR 19780 at commit 
[`e08259a`](https://github.com/apache/spark/commit/e08259a41d0f39c751858daba713b30e52a6c3a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19780: [SPARK-22551][SQL][WIP] Prevent possible 64kb com...

2017-11-18 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/19780

[SPARK-22551][SQL][WIP] Prevent possible 64kb compile error for common 
expression types

## What changes were proposed in this pull request?

For common expression types, such as BinaryExpression and 
TernaryExpression, the combination of generated codes of children can possibly 
be large. We should put the codes into functions to prevent possible 64kb 
compile error.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-22551

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19780.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19780


commit e08259a41d0f39c751858daba713b30e52a6c3a4
Author: Liang-Chi Hsieh 
Date:   2017-11-18T15:11:05Z

Put large generated codes of children expressions into functions.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18906
  
So I think with the performance improvements coming into Python UDFs 
considering annotating results as nullable or not could make sense (although I 
imagine we'd need to do something differeent for the vectorized UDFs if they 
aren't already being done).

Let's loop in @BryanCutler , but the I think the performance improvements 
could be reasonable to be thinking about in Spark 2.3+.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18339
  
Jenkins OK to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet state sho...

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18982
  
Can we update this to master?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to av...

2017-11-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19498#discussion_r151839513
  
--- Diff: python/pyspark/streaming/util.py ---
@@ -64,7 +64,11 @@ def call(self, milliseconds, jrdds):
 t = datetime.fromtimestamp(milliseconds / 1000.0)
 r = self.func(t, *rdds)
 if r:
-return r._jrdd
+# Here, we work around to ensure `_jrdd` is `JavaRDD` by 
wrapping it by `map`.
+# 
org.apache.spark.streaming.api.python.PythonTransformFunction requires to return
+# `JavaRDD`; however, this could be `JavaPairRDD` by some 
APIs, for example, `zip`.
+# See SPARK-17756.
+return r.map(lambda x: x)._jrdd
--- End diff --

Personally, I think the only applying the `map` when the result is not 
JavaRDD is a good incremental improvement (since otherwise the code path fails 
right?).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18457
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18457
  
**[Test build #83986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83986/testReport)**
 for PR 18457 at commit 
[`544b4d0`](https://github.com/apache/spark/commit/544b4d0e0691ea2912cf214a8c296171d8fc2d2b).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18457
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83986/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18277
  
What do you think @HyukjinKwon ? I think this is probably a reasonable fix, 
but we might break some peoples code who have been depending on the bug.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18457
  
**[Test build #83986 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83986/testReport)**
 for PR 18457 at commit 
[`544b4d0`](https://github.com/apache/spark/commit/544b4d0e0691ea2912cf214a8c296171d8fc2d2b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15670: [SPARK-18161] [Python] Allow pickle to serialize >4 GB o...

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15670
  
Would you be ok with someone taking over this PR if your busy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18457
  
err Jenkins test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18457
  
Jenkins, test this plase.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

2017-11-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19643#discussion_r151839304
  
--- Diff: python/pyspark/context.py ---
@@ -860,6 +860,23 @@ def addPyFile(self, path):
 import importlib
 importlib.invalidate_caches()
 
+def addJar(self, path, addToCurrentClassLoader=False):
+"""
+Adds a JAR dependency for Spark tasks to be executed in the future.
+The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
+filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file 
on every worker node.
+If `addToCurrentClassLoader` is true, add the jar to the current 
threads' class loader
+in the backing JVM. In general adding to the current threads' 
class loader will impact all
+other application threads unless they have explicitly changed 
their class loader.
--- End diff --

So we currently use `.. note:: DeveloperApi` to indicate it's a developer 
API (see ml/pipeline and friends for an example).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2017-11-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19627
  
What happens when you run `check-license` locally? I agree it doesn't look 
like any of these changes would impact the license headers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19774: [SPARK-22475][SQL] show histogram in DESC COLUMN ...

2017-11-18 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19774#discussion_r151838625
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -689,6 +689,11 @@ case class DescribeColumnCommand(
   buffer += Row("distinct_count", 
cs.map(_.distinctCount.toString).getOrElse("NULL"))
   buffer += Row("avg_col_len", 
cs.map(_.avgLen.toString).getOrElse("NULL"))
   buffer += Row("max_col_len", 
cs.map(_.maxLen.toString).getOrElse("NULL"))
+  buffer ++= cs.flatMap(_.histogram.map { hist =>
+val header = Row("histogram", s"height: ${hist.height}, 
num_of_bins: ${hist.bins.length}")
+Seq(header) ++ hist.bins.map(bin =>
+  Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, 
distinct_count: ${bin.ndv}"))
--- End diff --

@wzhfy I'd rather define a `val` with the comment being the name of the 
val. That would make it "compile-safe".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19407: [SPARK-21667][Streaming] ConsoleSink should not f...

2017-11-18 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19407#discussion_r151838606
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala 
---
@@ -267,11 +267,12 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
 useTempCheckpointLocation = true,
 trigger = trigger)
 } else {
-  val (useTempCheckpointLocation, recoverFromCheckpointLocation) =
+  val recoverFromCheckpointLocation = true
+  val useTempCheckpointLocation =
 if (source == "console") {
-  (true, true)
+  true
 } else {
-  (false, true)
+  false
--- End diff --

Do we really need it anymore since the `if` expression is just `source == 
"console"`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83985/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19779
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19779
  
**[Test build #83985 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83985/testReport)**
 for PR 19779 at commit 
[`a59bd09`](https://github.com/apache/spark/commit/a59bd093878cf7060781ad0628176ff9b3df63a1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19779
  
**[Test build #83985 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83985/testReport)**
 for PR 19779 at commit 
[`a59bd09`](https://github.com/apache/spark/commit/a59bd093878cf7060781ad0628176ff9b3df63a1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19779
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83984/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19779
  
**[Test build #83984 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83984/testReport)**
 for PR 19779 at commit 
[`034b246`](https://github.com/apache/spark/commit/034b2466d073c008b71eae072ee98353df56cbf2).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19779
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19779
  
**[Test build #83984 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83984/testReport)**
 for PR 19779 at commit 
[`034b246`](https://github.com/apache/spark/commit/034b2466d073c008b71eae072ee98353df56cbf2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-18 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/19779

[SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support writing to Hive table 
which uses Avro schema url 'avro.schema.url'

## What changes were proposed in this pull request?
Support writing to Hive table which uses Avro schema url 'avro.schema.url'
For ex: 
create external table avro_in (a string) stored as avro location 
'/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

create external table avro_out (a string) stored as avro location 
'/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

 insert overwrite table avro_out select * from avro_in;  // fails with 
java.lang.NullPointerException

 WARN AvroSerDe: Encountered exception determining schema. Returning signal 
schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)

## Changes proposed in this fix
Currently 'null' value is passed to serializer, which causes NPE during 
insert operation, instead pass Hadoop configuration object
## How was this patch tested?
Added new test case in VersionsSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_Fix_SPARK-17920

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19779


commit 034b2466d073c008b71eae072ee98353df56cbf2
Author: vinodkc 
Date:   2017-11-18T07:52:59Z

pass hadoopConfiguration to Serializer




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org