[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20362
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20362
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #86526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86526/testReport)**
 for PR 18906 at commit 
[`3b72f0d`](https://github.com/apache/spark/commit/3b72f0dc79e3abb8f9710801fb15e303d4786290).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...

2018-01-23 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request:

https://github.com/apache/spark/pull/20362

[Spark-22886][ML][TESTS] ML test for structured streaming: ml.recomme…

## What changes were proposed in this pull request?

Converting spark.ml.recommendation tests to also check code with structured 
streaming, using the ML testing infrastructure implemented in SPARK-22882.

## How was this patch tested?

Automated: Pass the Jenkins.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaborgsomogyi/spark SPARK-22886

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20362


commit 33654c93c2fe240eb0c6a6932353239ab84b0ce0
Author: Gabor Somogyi 
Date:   2018-01-18T20:27:08Z

[Spark-22886][ML][TESTS] ML test for structured streaming: ml.recommendation




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

2018-01-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20091


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86525/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19993
  
**[Test build #86525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86525/testReport)**
 for PR 19993 at commit 
[`ebc6d16`](https://github.com/apache/spark/commit/ebc6d16586318155180e37d7a1a199aa1a8b9cf2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

2018-01-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20360#discussion_r163223074
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends 
Rule[LogicalPlan] {
 
   private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): 
Boolean = {
 expr.find {
-  e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, 
agg)).isDefined
+  e => PythonUDF.isScalarPythonUDF(e) &&
+(e.references.isEmpty || e.find(belongAggregate(_, agg)).isDefined)
--- End diff --

I just want to consider some literal inputs like `df2 = 
df.distinct().withColumn("a", f_udf(f.lit("2")))`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20360
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20360
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86523/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20360
  
**[Test build #86523 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86523/testReport)**
 for PR 20360 at commit 
[`5c3afbb`](https://github.com/apache/spark/commit/5c3afbbdf762411023b06348b2bfe3dbc2ff4287).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86521/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86521 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86521/testReport)**
 for PR 20224 at commit 
[`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `.doc(\"When true, embed the codegen stage ID into the class name 
of the generated class\")`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163206632
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.python
+
+import java.io.File
+import java.util.{Map => JMap}
+import java.util.Arrays
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+
+
+private[spark] class VirtualEnvFactory(pythonExec: String, conf: 
SparkConf, isDriver: Boolean)
+  extends Logging {
+
+  private var virtualEnvType = conf.get("spark.pyspark.virtualenv.type", 
"native")
+  private var virtualEnvPath = 
conf.get("spark.pyspark.virtualenv.bin.path", "")
+  private var virtualEnvName: String = _
+  private var virtualPythonExec: String = _
+  private val VIRTUALENV_ID = new AtomicInteger()
+  private var isLauncher: Boolean = false
+
+  // used by launcher when user want to use virtualenv in pyspark shell. 
Launcher need this class
+  // to create virtualenv for driver.
+  def this(pythonExec: String, properties: JMap[String, String], isDriver: 
java.lang.Boolean) {
--- End diff --

I guess we can use `boolean.class` in java reflection.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163203430
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.python
+
+import java.io.File
+import java.util.{Map => JMap}
+import java.util.Arrays
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+
+
+class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)
+  extends Logging {
+
+  private var virtualEnvType = conf.get("spark.pyspark.virtualenv.type", 
"native")
+  private var virtualEnvBinPath = 
conf.get("spark.pyspark.virtualenv.bin.path", "")
+  private var initPythonPackages = 
conf.getOption("spark.pyspark.virtualenv.packages")
+  private var virtualEnvName: String = _
+  private var virtualPythonExec: String = _
+  private val VIRTUALENV_ID = new AtomicInteger()
+  private var isLauncher: Boolean = false
+
+  // used by launcher when user want to use virtualenv in pyspark shell. 
Launcher need this class
+  // to create virtualenv for driver.
+  def this(pythonExec: String, properties: JMap[String, String], isDriver: 
java.lang.Boolean) {
+this(pythonExec, new SparkConf().setAll(properties.asScala), isDriver)
+this.isLauncher = true
+  }
+
+  /*
+   * Create virtualenv using native virtualenv or conda
+   *
+   */
+  def setupVirtualEnv(): String = {
+/*
+ *
+ * Native Virtualenv:
+ *   -  Execute command: virtualenv -p  --no-site-packages 

+ *   -  Execute command: python -m pip --cache-dir  install 
-r 
+ *
+ * Conda
+ *   -  Execute command: conda create --prefix  --file 
 -y
+ *
+ */
+logInfo("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: $virtualEnvType is not supported." )
+require(new File(virtualEnvBinPath).exists(),
+  s"VirtualEnvBinPath: $virtualEnvBinPath is not defined or doesn't 
exist.")
+// Two scenarios of creating virtualenv:
+// 1. created in yarn container. Yarn will clean it up after container 
is exited
+// 2. created outside yarn container. Spark need to create temp 
directory and clean it after app
+//finish.
+//  - driver of PySpark shell
+//  - driver of yarn-client mode
+if (isLauncher ||
+  (isDriver && conf.get("spark.submit.deployMode") == "client")) {
+  val virtualenv_basedir = Files.createTempDir()
--- End diff --

nit: `virtualenvBasedir`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163202935
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.python
+
+import java.io.File
+import java.util.{Map => JMap}
+import java.util.Arrays
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+
+
+class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)
+  extends Logging {
+
+  private var virtualEnvType = conf.get("spark.pyspark.virtualenv.type", 
"native")
+  private var virtualEnvBinPath = 
conf.get("spark.pyspark.virtualenv.bin.path", "")
+  private var initPythonPackages = 
conf.getOption("spark.pyspark.virtualenv.packages")
+  private var virtualEnvName: String = _
+  private var virtualPythonExec: String = _
--- End diff --

Do we need to have these as an instance variables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163203798
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.python
+
+import java.io.File
+import java.util.{Map => JMap}
+import java.util.Arrays
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+
+
+class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)
+  extends Logging {
+
+  private var virtualEnvType = conf.get("spark.pyspark.virtualenv.type", 
"native")
+  private var virtualEnvBinPath = 
conf.get("spark.pyspark.virtualenv.bin.path", "")
+  private var initPythonPackages = 
conf.getOption("spark.pyspark.virtualenv.packages")
+  private var virtualEnvName: String = _
+  private var virtualPythonExec: String = _
+  private val VIRTUALENV_ID = new AtomicInteger()
+  private var isLauncher: Boolean = false
+
+  // used by launcher when user want to use virtualenv in pyspark shell. 
Launcher need this class
+  // to create virtualenv for driver.
+  def this(pythonExec: String, properties: JMap[String, String], isDriver: 
java.lang.Boolean) {
+this(pythonExec, new SparkConf().setAll(properties.asScala), isDriver)
+this.isLauncher = true
+  }
+
+  /*
+   * Create virtualenv using native virtualenv or conda
+   *
+   */
+  def setupVirtualEnv(): String = {
+/*
+ *
+ * Native Virtualenv:
+ *   -  Execute command: virtualenv -p  --no-site-packages 

+ *   -  Execute command: python -m pip --cache-dir  install 
-r 
+ *
+ * Conda
+ *   -  Execute command: conda create --prefix  --file 
 -y
+ *
+ */
+logInfo("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: $virtualEnvType is not supported." )
+require(new File(virtualEnvBinPath).exists(),
+  s"VirtualEnvBinPath: $virtualEnvBinPath is not defined or doesn't 
exist.")
+// Two scenarios of creating virtualenv:
+// 1. created in yarn container. Yarn will clean it up after container 
is exited
+// 2. created outside yarn container. Spark need to create temp 
directory and clean it after app
+//finish.
+//  - driver of PySpark shell
+//  - driver of yarn-client mode
+if (isLauncher ||
+  (isDriver && conf.get("spark.submit.deployMode") == "client")) {
+  val virtualenv_basedir = Files.createTempDir()
+  virtualenv_basedir.deleteOnExit()
+  virtualEnvName = virtualenv_basedir.getAbsolutePath
+} else if (isDriver && conf.get("spark.submit.deployMode") == 
"cluster") {
+  virtualEnvName = "virtualenv_driver"
+} else {
+  // use the working directory of Executor
+  virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+}
+
+// Use the absolute path of requirement file in the following cases
+// 1. driver of pyspark shell
+// 2. driver of yarn-client mode
+// otherwise just use filename as it would be downloaded to the 
working directory of Executor
+val pysparkRequirements =
+  if (isLauncher ||
+(isDriver && conf.get("spark.submit.deployMode") == "client")) {
+conf.getOption("spark.pyspark.virtualenv.requirements")
+  } else {
+
conf.getOption("spark.pyspark.virtualenv.requirements").map(_.split("/").last)
+  }
+
+val createEnvCommand =

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163201794
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.python
+
+import java.io.File
+import java.util.{Map => JMap}
+import java.util.Arrays
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+
+
+class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)
+  extends Logging {
+
+  private var virtualEnvType = conf.get("spark.pyspark.virtualenv.type", 
"native")
+  private var virtualEnvBinPath = 
conf.get("spark.pyspark.virtualenv.bin.path", "")
+  private var initPythonPackages = 
conf.getOption("spark.pyspark.virtualenv.packages")
--- End diff --

Use `val`s for these three variables.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163198358
  
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1032,42 @@ def getConf(self):
 conf.setAll(self._conf.getAll())
 return conf
 
+def install_packages(self, packages):
+"""
+install python packages on all executors and driver through pip. 
pip will be installed
--- End diff --

nit: `Install` instead of `install` at the beginning of the line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163199689
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -299,20 +300,38 @@
 // 4. environment variable PYSPARK_PYTHON
 // 5. python
 List pyargs = new ArrayList<>();
-pyargs.add(firstNonEmpty(conf.get(SparkLauncher.PYSPARK_DRIVER_PYTHON),
+String pythonExec = 
firstNonEmpty(conf.get(SparkLauncher.PYSPARK_DRIVER_PYTHON),
   conf.get(SparkLauncher.PYSPARK_PYTHON),
   System.getenv("PYSPARK_DRIVER_PYTHON"),
   System.getenv("PYSPARK_PYTHON"),
-  "python"));
-String pyOpts = System.getenv("PYSPARK_DRIVER_PYTHON_OPTS");
-if (conf.containsKey(SparkLauncher.PYSPARK_PYTHON)) {
-  // pass conf spark.pyspark.python to python by environment variable.
-  env.put("PYSPARK_PYTHON", conf.get(SparkLauncher.PYSPARK_PYTHON));
+  "python");
+if (conf.getOrDefault("spark.pyspark.virtualenv.enabled", 
"false").equals("true")) {
+  try {
+// setup virtualenv in launcher when virtualenv is enabled in 
pyspark shell
+Class virtualEnvClazz = 
getClass().forName("org.apache.spark.api.python.VirtualEnvFactory");
+Object virtualEnv = virtualEnvClazz.getConstructor(String.class, 
Map.class, Boolean.class)
+  .newInstance(pythonExec, conf, true);
+Method virtualEnvMethod = 
virtualEnvClazz.getMethod("setupVirtualEnv");
+pythonExec = (String) virtualEnvMethod.invoke(virtualEnv);
+pyargs.add(pythonExec);
+  } catch (Exception e) {
+throw new IOException(e);
+  }
+} else {
+  
pyargs.add(firstNonEmpty(conf.get(SparkLauncher.PYSPARK_DRIVER_PYTHON),
+conf.get(SparkLauncher.PYSPARK_PYTHON),
+System.getenv("PYSPARK_DRIVER_PYTHON"),
+System.getenv("PYSPARK_PYTHON"),
+"python"));
--- End diff --

We can simplify as `pyargs.add(pythonExec);`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163199169
  
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1032,42 @@ def getConf(self):
 conf.setAll(self._conf.getAll())
 return conf
 
+def install_packages(self, packages):
+"""
+install python packages on all executors and driver through pip. 
pip will be installed
+by default no matter using native virtualenv or conda. So it is 
guaranteed that pip is
+available if virtualenv is enabled.
+:param packages: string for single package or a list of string for 
multiple packages
+:param install_driver: whether to install packages in client
--- End diff --

What's `install_driver` parameter?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163199771
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -299,20 +300,38 @@
 // 4. environment variable PYSPARK_PYTHON
 // 5. python
 List pyargs = new ArrayList<>();
-pyargs.add(firstNonEmpty(conf.get(SparkLauncher.PYSPARK_DRIVER_PYTHON),
+String pythonExec = 
firstNonEmpty(conf.get(SparkLauncher.PYSPARK_DRIVER_PYTHON),
   conf.get(SparkLauncher.PYSPARK_PYTHON),
   System.getenv("PYSPARK_DRIVER_PYTHON"),
   System.getenv("PYSPARK_PYTHON"),
-  "python"));
-String pyOpts = System.getenv("PYSPARK_DRIVER_PYTHON_OPTS");
-if (conf.containsKey(SparkLauncher.PYSPARK_PYTHON)) {
-  // pass conf spark.pyspark.python to python by environment variable.
-  env.put("PYSPARK_PYTHON", conf.get(SparkLauncher.PYSPARK_PYTHON));
+  "python");
+if (conf.getOrDefault("spark.pyspark.virtualenv.enabled", 
"false").equals("true")) {
+  try {
+// setup virtualenv in launcher when virtualenv is enabled in 
pyspark shell
+Class virtualEnvClazz = 
getClass().forName("org.apache.spark.api.python.VirtualEnvFactory");
--- End diff --

`Class.forName` instead of `getClass().forName`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163194199
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -82,6 +90,12 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 envVars.getOrElse("PYTHONPATH", ""),
 sys.env.getOrElse("PYTHONPATH", ""))
 
+
+  if (virtualEnvEnabled) {
+val virtualEnvFactory = new VirtualEnvFactory(pythonExec, conf, false)
+virtualenvPythonExec = Some(virtualEnvFactory.setupVirtualEnv())
+  }
--- End diff --

I guess we don't prefer unnecessary `var`s.

How about the following with the diff above:

```scala
val virtualEnvEnabled = conf.getBoolean("spark.pyspark.virtualenv.enabled", 
false)
val virtualenvPythonExec = if (virtualEnvEnabled) {
  val virtualEnvFactory = new VirtualEnvFactory(pythonExec, conf, false)
  Some(virtualEnvFactory.setupVirtualEnv())
} else {
  None
}
```

Or maybe we can:

```scala
val virtualEnvEnabled = conf.getBoolean("spark.pyspark.virtualenv.enabled", 
false)
val virtualenvPythonExec = if (virtualEnvEnabled) {
  val virtualEnvFactory = new VirtualEnvFactory(pythonExec, conf, false)
  virtualEnvFactory.setupVirtualEnv()
} else {
  pythonExec
}
```

and use this directly.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r163197957
  
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1032,42 @@ def getConf(self):
 conf.setAll(self._conf.getAll())
 return conf
 
+def install_packages(self, packages):
+"""
+install python packages on all executors and driver through pip. 
pip will be installed
+by default no matter using native virtualenv or conda. So it is 
guaranteed that pip is
+available if virtualenv is enabled.
+:param packages: string for single package or a list of string for 
multiple packages
+:param install_driver: whether to install packages in client
+"""
+if self._conf.get("spark.pyspark.virtualenv.enabled") != "true":
+raise RuntimeError("install_packages can only use called when "
+   "spark.pyspark.virtualenv.enabled is set as 
true")
+if isinstance(packages, basestring):
+packages = [packages]
+# seems statusTracker.getExecutorInfos() will return driver + 
exeuctors, so -1 here.
+num_executors = 
len(self._jsc.sc().statusTracker().getExecutorInfos()) - 1
+dummyRDD = self.parallelize(range(num_executors), num_executors)
+
+def _run_pip(packages, iterator):
+import pip
+return pip.main(["install"] + packages)
+
+# install package on driver first. if installation succeeded, 
continue the installation
+# on executors, otherwise return directly.
+if _run_pip(packages, None) != 0:
+return
+
+virtualenvPackages = 
self._conf.get("spark.pyspark.virtualenv.packages")
+if virtualenvPackages:
+self._conf.set("spark.pyspark.virtualenv.packages", 
virtualenvPackages + "," +
+   ",".join(packages))
+else:
+self._conf.set("spark.pyspark.virtualenv.packages", 
",".join(packages))
+
+import functools
+dummyRDD.foreachPartition(functools.partial(_run_pip, packages))
--- End diff --

I guess this does not guarantee that the `_run_pip` is executed on all 
executors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/137/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19993
  
**[Test build #86525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86525/testReport)**
 for PR 19993 at commit 
[`ebc6d16`](https://github.com/apache/spark/commit/ebc6d16586318155180e37d7a1a199aa1a8b9cf2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20360: [SPARK-23177][SQL][PySpark] Extract zero-paramete...

2018-01-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20360#discussion_r163189081
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -45,7 +45,8 @@ object ExtractPythonUDFFromAggregate extends 
Rule[LogicalPlan] {
 
   private def hasPythonUdfOverAggregate(expr: Expression, agg: Aggregate): 
Boolean = {
 expr.find {
-  e => PythonUDF.isScalarPythonUDF(e) && e.find(belongAggregate(_, 
agg)).isDefined
+  e => PythonUDF.isScalarPythonUDF(e) &&
+(e.references.isEmpty || e.find(belongAggregate(_, agg)).isDefined)
--- End diff --

Can we use just `e.children` instead of `e.references`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20350: [SPARK-23179][SQL] Support option to throw exception if ...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20350
  
**[Test build #86524 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86524/testReport)**
 for PR 20350 at commit 
[`610a595`](https://github.com/apache/spark/commit/610a595bf61721c38edfaf29dcc161e363319423).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20350: [SPARK-23179][SQL] Support option to throw exception if ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20350
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/136/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20350: [SPARK-23179][SQL] Support option to throw exception if ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20350
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-23 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163184979
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -237,14 +238,26 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   /**
* Create new `Decimal` with given precision and scale.
*
-   * @return a non-null `Decimal` value if successful or `null` if 
overflow would occur.
+   * @return a non-null `Decimal` value if successful. Otherwise, if 
`nullOnOverflow` is true, null
+   * is returned; if `nullOnOverflow` is false, an 
`ArithmeticException` is thrown.
*/
   private[sql] def toPrecision(
   precision: Int,
   scale: Int,
-  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP): Decimal = 
{
+  roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
+  nullOnOverflow: Boolean = true): Decimal = {
 val copy = clone()
-if (copy.changePrecision(precision, scale, roundMode)) copy else null
+if (copy.changePrecision(precision, scale, roundMode)) {
+  copy
+} else {
+  def message = s"$toDebugString cannot be represented as 
Decimal($precision, $scale)."
+  if (nullOnOverflow) {
+if (log.isDebugEnabled) logDebug(s"$message NULL is returned.")
+null
--- End diff --

since also @hvanhovell was suggesting that this is not necessary, even 
though I think it would be good to have it, I am removing it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20350: [SPARK-23179][SQL] Support option to throw except...

2018-01-23 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20350#discussion_r163184915
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/decimalArithmeticOperations.sql ---
@@ -49,7 +49,6 @@ select 1e35 / 0.1;
 
 -- arithmetic operations causing a precision loss are truncated
 select 123456789123456789.1234567890 * 1.123456789123456789;
-select 0.001 / 9876543210987654321098765432109876543.2
--- End diff --

yes, unfortunately I missed it somehow previously...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2018-01-23 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19892
  
RC2 has been cut - @jkbradley do you see #19993 as a blocker? I think it 
should be merged for `2.3`. And also there are QA JIRAs (sub-tasks of 
[SPARK-23105](https://issues.apache.org/jira/browse/SPARK-23105)) that are 
blockers that are not reflected in the list of blockers for `2.3` as they are 
not targeted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20361
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20361
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86522/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20361
  
**[Test build #86522 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86522/testReport)**
 for PR 20361 at commit 
[`927c6b4`](https://github.com/apache/spark/commit/927c6b4d16b5a4c6457a190f3c1b2b8a5e439f2a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20356: [SPARK-23185][SQL] Make the configuration "spark.default...

2018-01-23 Thread lvdongr
Github user lvdongr commented on the issue:

https://github.com/apache/spark/pull/20356
  
Think you very much for your review. I see the discussion,  your pr and 
learn a lot.  But I just want to solve the problem when execute "insert into 
... values ...", which not involves in file source.  May be we can solve this 
first which trouble my team for a long time?  @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20360
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/135/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20360
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20177: [SPARK-22954][SQL] Fix the exception thrown by Analyze c...

2018-01-23 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20177
  
Could you also add test cases that cover ANALYZE PARTITION and ANALYZE 
COLUMN queries?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20360
  
**[Test build #86523 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86523/testReport)**
 for PR 20360 at commit 
[`5c3afbb`](https://github.com/apache/spark/commit/5c3afbbdf762411023b06348b2bfe3dbc2ff4287).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20360
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20361
  
**[Test build #86522 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86522/testReport)**
 for PR 20361 at commit 
[`927c6b4`](https://github.com/apache/spark/commit/927c6b4d16b5a4c6457a190f3c1b2b8a5e439f2a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20361
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/134/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20361
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20361: [SPARK-23188] [SQL] Make vectorized columar reade...

2018-01-23 Thread jiangxb1987
GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/20361

[SPARK-23188] [SQL] Make vectorized columar reader batch size configurable

## What changes were proposed in this pull request?

This PR include the following changes:
- Make the capacity of `VectorizedParquetRecordReader` configurable;
- Make the capacity of `OrcColumnarBatchReader` configurable;
- Update the error message when required capacity in writable columnar 
vector cannot be fulfilled.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark vectorCapacity

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20361


commit 927c6b4d16b5a4c6457a190f3c1b2b8a5e439f2a
Author: Xingbo Jiang 
Date:   2018-01-23T08:14:33Z

make vector batch size configurable.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86521 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86521/testReport)**
 for PR 20224 at commit 
[`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/133/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20359: [SPARK-23186][SQL] Initialize DriverManager first before...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20359
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20359: [SPARK-23186][SQL] Initialize DriverManager first before...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20359
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86514/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86515/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17702
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86517/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20360
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86520/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20360
  
**[Test build #86520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86520/testReport)**
 for PR 20360 at commit 
[`5c3afbb`](https://github.com/apache/spark/commit/5c3afbbdf762411023b06348b2bfe3dbc2ff4287).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86515 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86515/testReport)**
 for PR 20224 at commit 
[`a0162aa`](https://github.com/apache/spark/commit/a0162aacb6e6e88057819e878fc2ddd7ed9ceb91).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `.doc(\"When true, embed the codegen stage ID into the class name 
of the generated class\")`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86516/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86516 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86516/testReport)**
 for PR 20224 at commit 
[`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `.doc(\"When true, embed the codegen stage ID into the class name 
of the generated class\")`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17702
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20360: [SPARK-23177][SQL][PySpark] Extract zero-parameter UDFs ...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20360
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20359: [SPARK-23186][SQL] Initialize DriverManager first before...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20359
  
**[Test build #86514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86514/testReport)**
 for PR 20359 at commit 
[`234a637`](https://github.com/apache/spark/commit/234a637085451b27c7f7c5ca18d94d46c0815d9d).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17702
  
**[Test build #86517 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86517/testReport)**
 for PR 17702 at commit 
[`dc373ae`](https://github.com/apache/spark/commit/dc373ae68040b92e7b67c99ce0c793c80b5ab507).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6