date:20180103

[GitHub] spark issue #20141: [SPARK-22945][SQL] add java UDF APIs in the functions ob...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20141
  
**[Test build #85658 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85658/testReport)**
 for PR 20141 at commit 
[`711c027`](https://github.com/apache/spark/commit/711c0278a6d77472761414692c987d256293467e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20148
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85669/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20148
  
**[Test build #85669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85669/testReport)**
 for PR 20148 at commit 
[`93e1d64`](https://github.com/apache/spark/commit/93e1d648d56079dcc8bd28bafa7dad4cd9d2dfeb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85656/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20146
  
**[Test build #85656 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85656/testReport)**
 for PR 20146 at commit 
[`bb990f1`](https://github.com/apache/spark/commit/bb990f1a6511d8ce20f4fff254dfe0ff43262a10).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85668/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20148
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20148
  
**[Test build #85668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85668/testReport)**
 for PR 20148 at commit 
[`92251aa`](https://github.com/apache/spark/commit/92251aa5d518083ae4016f307b48a31d67b87067).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20148
  
**[Test build #85669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85669/testReport)**
 for PR 20148 at commit 
[`93e1d64`](https://github.com/apache/spark/commit/93e1d648d56079dcc8bd28bafa7dad4cd9d2dfeb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85667/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20148#discussion_r159592335
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala
 ---
@@ -172,10 +172,8 @@ class ExecutorPodFactorySuite extends SparkFunSuite 
with BeforeAndAfter with Bef
   "1", "dummy", "dummy", Seq[(String, String)](), driverPod, 
Map[String, Int]())
 
 assert(executor.getSpec.getInitContainers.size() === 1)
-
assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0).getName
-  === "secret1-volume")
-assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0)
-  .getMountPath === "/var/secret1")
+assert(SecretVolumeUtils.containerHasVolume(
--- End diff --

Done. 
https://github.com/apache/spark/pull/20148/commits/93e1d648d56079dcc8bd28bafa7dad4cd9d2dfeb.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20148
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20148
  
**[Test build #85667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85667/testReport)**
 for PR 20148 at commit 
[`2e6f7db`](https://github.com/apache/spark/commit/2e6f7db4efd9b819eee4c5d6d0152638ba0931d1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread hex108

Github user hex108 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20148#discussion_r159591760
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala
 ---
@@ -172,10 +172,8 @@ class ExecutorPodFactorySuite extends SparkFunSuite 
with BeforeAndAfter with Bef
   "1", "dummy", "dummy", Seq[(String, String)](), driverPod, 
Map[String, Int]())
 
 assert(executor.getSpec.getInitContainers.size() === 1)
-
assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0).getName
-  === "secret1-volume")
-assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0)
-  .getMountPath === "/var/secret1")
+assert(SecretVolumeUtils.containerHasVolume(
--- End diff --

Thanks!  We also need check volumes' num in pod spec.

+  // check volume mounted.
+ assert(executor.getSpec.getVolumes.size() === 1)
+ assert(executor.getSpec.getVolumes.get(0).getSecret.getSecretName === 
"secret1")


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20148
  
**[Test build #85668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85668/testReport)**
 for PR 20148 at commit 
[`92251aa`](https://github.com/apache/spark/commit/92251aa5d518083ae4016f307b48a31d67b87067).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20148#discussion_r159590877
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala
 ---
@@ -172,10 +172,8 @@ class ExecutorPodFactorySuite extends SparkFunSuite 
with BeforeAndAfter with Bef
   "1", "dummy", "dummy", Seq[(String, String)](), driverPod, 
Map[String, Int]())
 
 assert(executor.getSpec.getInitContainers.size() === 1)
-
assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0).getName
-  === "secret1-volume")
-assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0)
-  .getMountPath === "/var/secret1")
+assert(SecretVolumeUtils.containerHasVolume(
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159590636
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // UDFToString
   private[this] def castToString(from: DataType): Any => Any = from match {
 case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes)
+case StringType => buildCast[UTF8String](_, identity)
 case DateType => buildCast[Int](_, d => 
UTF8String.fromString(DateTimeUtils.dateToString(d)))
--- End diff --

yea, good idea. So, I'll add the builder class.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20148
  
**[Test build #85667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85667/testReport)**
 for PR 20148 at commit 
[`2e6f7db`](https://github.com/apache/spark/commit/2e6f7db4efd9b819eee4c5d6d0152638ba0931d1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread hex108

Github user hex108 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20148#discussion_r159590343
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactorySuite.scala
 ---
@@ -172,10 +172,8 @@ class ExecutorPodFactorySuite extends SparkFunSuite 
with BeforeAndAfter with Bef
   "1", "dummy", "dummy", Seq[(String, String)](), driverPod, 
Map[String, Int]())
 
 assert(executor.getSpec.getInitContainers.size() === 1)
-
assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0).getName
-  === "secret1-volume")
-assert(executor.getSpec.getInitContainers.get(0).getVolumeMounts.get(0)
-  .getMountPath === "/var/secret1")
+assert(SecretVolumeUtils.containerHasVolume(
--- End diff --

It might be better to change `None` at line 168 to `Some(secretBootstrap)` 
and check volumes' number to avoid regression.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159590247
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -608,6 +667,20 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 val tz = ctx.addReferenceObj("timeZone", timeZone)
 (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
   
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));"""
+  case ar: ArrayType =>
+(c, evPrim, evNull) => {
+  val bufferTerm = ctx.freshName("bufferTerm")
+  val bufferClass = classOf[StringBuffer].getName
+  val writeArrayToBuffer = codegenWriteArrayToBuffer(ar, ctx)
+  s"""
+ |$bufferClass $bufferTerm = new $bufferClass();
+ |if (!$evNull) {
--- End diff --

ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20148#discussion_r159590150
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala
 ---
@@ -28,20 +28,26 @@ private[spark] class 
MountSecretsBootstrap(secretNamesToMountPaths: Map[String,
*
* @param pod the pod into which the secret volumes are being added.
* @param container the container into which the secret volumes are 
being mounted.
+   * @param addNewVolumes whether to add new secret volumes for the 
secrets.
* @return the updated pod and container with the secrets mounted.
*/
-  def mountSecrets(pod: Pod, container: Container): (Pod, Container) = {
+  def mountSecrets(
--- End diff --

Good point. Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20147
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20147
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85659/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader's pare...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20145
  
**[Test build #85666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85666/testReport)**
 for PR 20145 at commit 
[`fea2490`](https://github.com/apache/spark/commit/fea2490027c611fb9be61ea33ad4de962ccf2ffa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20147
  
**[Test build #85659 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85659/testReport)**
 for PR 20147 at commit 
[`c5e835e`](https://github.com/apache/spark/commit/c5e835eb317d0b6970daef0ffc1214576874e8c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159589454
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -608,6 +667,20 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 val tz = ctx.addReferenceObj("timeZone", timeZone)
 (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
   
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));"""
+  case ar: ArrayType =>
+(c, evPrim, evNull) => {
+  val bufferTerm = ctx.freshName("bufferTerm")
+  val bufferClass = classOf[StringBuffer].getName
--- End diff --

Since `StringBuffer` is used in  `stringExpressions` and 
`regexExpressions`, I just used this along with them. But, I'll update. (BTW, 
any reason to use `StringBuffer` in `stringExpressions` and `regexExpressions`?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20145#discussion_r159589347
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala ---
@@ -42,4 +47,41 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton
   assert(hiveConf("foo") === "bar")
 }
   }
+
+  test("ChildFirstURLClassLoader's parent is null") {
--- End diff --

I think we can remove it. The below tests case would fail without your PR, 
that proves your patch already.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159589202
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // UDFToString
   private[this] def castToString(from: DataType): Any => Any = from match {
 case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes)
+case StringType => buildCast[UTF8String](_, identity)
 case DateType => buildCast[Int](_, d => 
UTF8String.fromString(DateTimeUtils.dateToString(d)))
--- End diff --

I think this can also simplify the codegen part, as we have some duplicated 
code to avoid this conversion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159589126
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // UDFToString
   private[this] def castToString(from: DataType): Any => Any = from match {
 case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes)
+case StringType => buildCast[UTF8String](_, identity)
 case DateType => buildCast[Int](_, d => 
UTF8String.fromString(DateTimeUtils.dateToString(d)))
--- End diff --

we may covert a string to `UTF8String` and then convert it back, which is 
inefficient. I think we should create a special `StringBuilder` for 
`UTF8String`, e.g.
```
class UTF8StringBuilder {
  public void append(UTF8String str)
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread foxish

Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20148#discussion_r159588871
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala
 ---
@@ -28,20 +28,26 @@ private[spark] class 
MountSecretsBootstrap(secretNamesToMountPaths: Map[String,
*
* @param pod the pod into which the secret volumes are being added.
* @param container the container into which the secret volumes are 
being mounted.
+   * @param addNewVolumes whether to add new secret volumes for the 
secrets.
* @return the updated pod and container with the secrets mounted.
*/
-  def mountSecrets(pod: Pod, container: Container): (Pod, Container) = {
+  def mountSecrets(
--- End diff --

Can we separate this method into `addSecretVolumes` and `mountSecrets`? The 
former would just need the pod argument, and the latter would take the 
container as argument. That way, we could probably separate this out better and 
it would make it more readable than the branching. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20148
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85664/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20148
  
**[Test build #85664 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85664/testReport)**
 for PR 20148 at commit 
[`9be26a8`](https://github.com/apache/spark/commit/9be26a857164e815594941adff6a8d2001d69001).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20149: [SPARK-22771][SQL] Add a missing return statement in Con...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20149
  
**[Test build #85665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85665/testReport)**
 for PR 20149 at commit 
[`2b385ac`](https://github.com/apache/spark/commit/2b385ac66e33bb18b651ddd6489bd691074960a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20148#discussion_r159588468
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala
 ---
@@ -28,20 +28,26 @@ private[spark] class 
MountSecretsBootstrap(secretNamesToMountPaths: Map[String,
*
* @param pod the pod into which the secret volumes are being added.
* @param container the container into which the secret volumes are 
being mounted.
+   * @param addNewVolumes whether to add new secret volumes for the 
secrets.
--- End diff --

Agreed. I didn't separate it out because we will touch this code as part of 
refactoring the steps code anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159588521
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -608,6 +667,20 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 val tz = ctx.addReferenceObj("timeZone", timeZone)
 (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
   
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));"""
+  case ar: ArrayType =>
+(c, evPrim, evNull) => {
+  val bufferTerm = ctx.freshName("bufferTerm")
+  val bufferClass = classOf[StringBuffer].getName
--- End diff --

why not use `StringBuilder`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159588472
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -608,6 +667,20 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 val tz = ctx.addReferenceObj("timeZone", timeZone)
 (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
   
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));"""
+  case ar: ArrayType =>
+(c, evPrim, evNull) => {
+  val bufferTerm = ctx.freshName("bufferTerm")
+  val bufferClass = classOf[StringBuffer].getName
+  val writeArrayToBuffer = codegenWriteArrayToBuffer(ar, ctx)
+  s"""
+ |$bufferClass $bufferTerm = new $bufferClass();
+ |if (!$evNull) {
--- End diff --

looking at the document of `CastFunction`, the `evNull` is actually 
`resultIsNull`, it's not about input but about result, so we should not check 
it here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20149: [SPARK-22771][SQL] Add a missing return statement in Con...

2018-01-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20149
  
cc: @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20149: [SPARK-22771][SQL] Add a missing return statement...

2018-01-03 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/20149

[SPARK-22771][SQL] Add a missing return statement in 
Concat.checkInputDataTypes

## What changes were proposed in this pull request?
This pr is a follow-up to fix a bug left in #19977.

## How was this patch tested?
Added tests in `StringExpressionsSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-22771-FOLLOWUP

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20149.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20149


commit 2b385ac66e33bb18b651ddd6489bd691074960a6
Author: Takeshi Yamamuro 
Date:   2018-01-04T06:43:50Z

Fix a bug




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread foxish

Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20148#discussion_r159588348
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala
 ---
@@ -28,20 +28,26 @@ private[spark] class 
MountSecretsBootstrap(secretNamesToMountPaths: Map[String,
*
* @param pod the pod into which the secret volumes are being added.
* @param container the container into which the secret volumes are 
being mounted.
+   * @param addNewVolumes whether to add new secret volumes for the 
secrets.
--- End diff --

I think this problem arose because we're conflating two things here - 
adding secret volumes (which are pod-scoped) and adding volume-mounts (which 
are container-scoped). I think we should separate these out. The branching may 
work for now, but we should have a future work item to separate these out.

cc/ @mccheah 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-03 Thread sharkdtu

Github user sharkdtu commented on the issue:

https://github.com/apache/spark/pull/20078
  
@felixcheung
Have you ever thought about initial num-executors? Actually, it is default 
2 executors when you run spark on yarn. How can you make sure that this 2 
executors have enougth cores for receivers at the begining?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159588015
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // UDFToString
   private[this] def castToString(from: DataType): Any => Any = from match {
 case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes)
+case StringType => buildCast[UTF8String](_, identity)
 case DateType => buildCast[Int](_, d => 
UTF8String.fromString(DateTimeUtils.dateToString(d)))
 case TimestampType => buildCast[Long](_,
   t => UTF8String.fromString(DateTimeUtils.timestampToString(t, 
timeZone)))
+case ar: ArrayType =>
+  buildCast[ArrayData](_, array => {
+val res = new StringBuilder
+res.append("[")
+if (array.numElements > 0) {
--- End diff --

yea, ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159587926
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
   // UDFToString
   private[this] def castToString(from: DataType): Any => Any = from match {
 case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes)
+case StringType => buildCast[UTF8String](_, identity)
 case DateType => buildCast[Int](_, d => 
UTF8String.fromString(DateTimeUtils.dateToString(d)))
 case TimestampType => buildCast[Long](_,
   t => UTF8String.fromString(DateTimeUtils.timestampToString(t, 
timeZone)))
+case ar: ArrayType =>
+  buildCast[ArrayData](_, array => {
+val res = new StringBuilder
+res.append("[")
+if (array.numElements > 0) {
--- End diff --

Actually I prefer your previous code style
```
val toStringFunc = castToString(ar.elementType)
if (array.numElements > 0) {
  res.append(toStringFunc(array.get(i, et)))
}
var i = 1
while (i < array.numElements) {
  res.append(", ")
  res.append(element...)
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20148: [SPARK-22953][K8S] Avoids adding duplicated secret volum...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20148
  
**[Test build #85664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85664/testReport)**
 for PR 20148 at commit 
[`9be26a8`](https://github.com/apache/spark/commit/9be26a857164e815594941adff6a8d2001d69001).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...

2018-01-03 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/20145#discussion_r159587396
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala ---
@@ -42,4 +47,41 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton
   assert(hiveConf("foo") === "bar")
 }
   }
+
+  test("ChildFirstURLClassLoader's parent is null") {
--- End diff --

just indicates that if we don't handle child first classloader's parent, we 
will get caught in exception. if it bothers, i can del it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20148: [SPARK-22953][K8S] Avoids adding duplicated secre...

2018-01-03 Thread liyinan926

GitHub user liyinan926 opened a pull request:

https://github.com/apache/spark/pull/20148

[SPARK-22953][K8S] Avoids adding duplicated secret volumes when 
init-container is used

## What changes were proposed in this pull request?

User-specified secrets are mounted into both the main container and 
init-container (when it is used) in a Spark driver/executor pod, using the 
`MountSecretsBootstrap`. Because `MountSecretsBootstrap` always adds new secret 
volumes for the secrets to the pod, the same secret volumes get added twice, 
one when mounting the secrets to the main container, and the other when 
mounting the secrets to the init-container. This PR fixes the issue by 
introducing a boolean flag to `MountSecretsBootstrap.mountSecrets` that tells 
whether to add new secret volumes. The flag is set to `true` when mounting 
secrets to the main container and `false` when mounting to the init-container.

Ref: https://github.com/apache-spark-on-k8s/spark/issues/594. 

## How was this patch tested?
Unit tested.

@vanzin @jiangxb1987 @ueshin This is to fix a critical bug in 2.3. PTAL. 
Thanks!
@hex108 @foxish @kimoonkim



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyinan926/spark-k8s branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20148


commit 9be26a857164e815594941adff6a8d2001d69001
Author: Yinan Li 
Date:   2018-01-04T06:20:01Z

[SPARK-22953][K8S] Avoids adding duplicated secret volumes when 
init-container is used




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader's pare...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20145
  
**[Test build #85663 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85663/testReport)**
 for PR 20145 at commit 
[`e1a9f2e`](https://github.com/apache/spark/commit/e1a9f2e12d17aeb3f7d20ed6695424c2c2765d16).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20145#discussion_r159586966
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala ---
@@ -42,4 +47,41 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton
   assert(hiveConf("foo") === "bar")
 }
   }
+
+  test("ChildFirstURLClassLoader's parent is null") {
--- End diff --

what is this test trying to test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20078
  
hmm, I didn't know that was changed actually (SPARK-13723)
But it seems to me `spark.streaming.dynamicAllocation.minExecutors` is 
still a valid approach. To match the non-streaming behavior would be a bigger 
change that I'd agree not sure we should take at this point.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...

2018-01-03 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/20145#discussion_r159585279
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala ---
@@ -42,4 +47,29 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton
   assert(hiveConf("foo") === "bar")
 }
   }
+
+  test("ChildFirstURLClassLoader's parent is null") {
+val conf = new SparkConf
+val contextClassLoader = Thread.currentThread().getContextClassLoader
+val loader = new FakeChildFirstURLClassLoader(Array(), 
contextClassLoader)
+Thread.currentThread().setContextClassLoader(loader)
+intercept[IllegalArgumentException](
+  HiveUtils.newClientForMetadata(conf, 
SparkHadoopUtil.newConfiguration(conf)))
+Thread.currentThread().setContextClassLoader(contextClassLoader)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20145: [SPARK-22950][SQL]Handle ChildFirstURLClassLoader...

2018-01-03 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/20145#discussion_r159585110
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala ---
@@ -42,4 +47,29 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton
   assert(hiveConf("foo") === "bar")
 }
   }
+
+  test("ChildFirstURLClassLoader's parent is null") {
+val conf = new SparkConf
+val contextClassLoader = Thread.currentThread().getContextClassLoader
+val loader = new FakeChildFirstURLClassLoader(Array(), 
contextClassLoader)
+Thread.currentThread().setContextClassLoader(loader)
+intercept[IllegalArgumentException](
+  HiveUtils.newClientForMetadata(conf, 
SparkHadoopUtil.newConfiguration(conf)))
+Thread.currentThread().setContextClassLoader(contextClassLoader)
+  }
+
+  test("ChildFirstURLClassLoader's parent is null, get spark classloader 
instead") {
+val conf = new SparkConf
+val contextClassLoader = Thread.currentThread().getContextClassLoader
+val loader = new ChildFirstURLClassLoader(Array(), contextClassLoader)
+Thread.currentThread().setContextClassLoader(loader)
+HiveUtils.newClientForMetadata(conf, 
SparkHadoopUtil.newConfiguration(conf))
+Thread.currentThread().setContextClassLoader(contextClassLoader)
+  }
 }
+
+/**
+ * A Fake [[ChildFirstURLClassLoader]] used for test
+ */
+private[spark] class FakeChildFirstURLClassLoader(urls: Array[URL], 
parent: ClassLoader)
--- End diff --

inheritation maketh case match


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20135: [SPARK-22937][SQL] SQL elt output binary for binary inpu...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20135
  
**[Test build #85661 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85661/testReport)**
 for PR 20135 at commit 
[`1ac512b`](https://github.com/apache/spark/commit/1ac512b26c2726904dbffdb447f538edfd686535).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20078
  
Originally in Spark dynamic allocation, "spark.executor.instances" and 
dynamic allocation conf cannot be co-existed, if "spark.executor.instances" is 
set, dynamic allocation will not be enabled. But this behavior is changed after 
2.0.

I think here for streaming dynamic allocation, we'd better keep it 
consistent with Spark dynamic allocation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16578
  
**[Test build #85662 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85662/testReport)**
 for PR 16578 at commit 
[`42067c7`](https://github.com/apache/spark/commit/42067c7c91c3fc72d57050d501bf39f1fd777bae).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20078
  
not saying about this change, but I've use streaming dynamic allocation 
quite a bit back in the day.
but in this case I think simply is to set 
`spark.streaming.dynamicAllocation.minExecutors` instead of `num-executors` as 
pointed out above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20024
  
**[Test build #85660 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85660/testReport)**
 for PR 20024 at commit 
[`1d623f8`](https://github.com/apache/spark/commit/1d623f8b0a53d2152b942ce3adf631a8c3dd1a12).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...

2018-01-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20024
  
ok, all the comments addressed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159584638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -608,6 +723,22 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 val tz = ctx.addReferenceObj("timeZone", timeZone)
 (c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString(
   
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));"""
+  case ar: ArrayType =>
+val bufferClass = classOf[StringBuffer].getName
+val buffer = ctx.addMutableState(bufferClass, "buffer", v => s"$v 
= new $bufferClass();")
+val writeArrayToBuffer = codegenWriteArrayToBuffer(ar, buffer, ctx)
+val arrayToStringCode =
+  s"""
+ |if (!${ev.isNull}) {
--- End diff --

ya, I missed. Dropped `ev` now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20024: [SPARK-22825][SQL] Fix incorrect results of Casti...

2018-01-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20024#discussion_r159584656
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -597,7 +605,114 @@ case class Cast(child: Expression, dataType: 
DataType, timeZoneId: Option[String
 """
   }
 
-  private[this] def castToStringCode(from: DataType, ctx: CodegenContext): 
CastFunction = {
+  private[this] def writeElemToBufferCode(
+  dataType: DataType,
+  buffer: String,
+  elemTerm: String,
+  ctx: CodegenContext): String = dataType match {
+case BinaryType => s"$buffer.append(new String($elemTerm))"
+case StringType => s"$buffer.append(new String($elemTerm.getBytes()))"
+case DateType => s"""$buffer.append(
+  
org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($elemTerm))"""
+case TimestampType => s"""$buffer.append(
+  
org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($elemTerm))"""
+case map: MapType => s"${codegenWriteMapToBuffer(map, buffer, 
ctx)}($elemTerm)"
+case ar: ArrayType => s"${codegenWriteArrayToBuffer(ar, buffer, 
ctx)}($elemTerm)"
+case st: StructType => s"${codegenWriteStructToBuffer(st, buffer, 
ctx)}($elemTerm)"
--- End diff --

ok, dropped.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20144: [SPARK-21475][CORE][2nd attempt] Change to use NIO's Fil...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20144
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20144: [SPARK-21475][CORE][2nd attempt] Change to use NIO's Fil...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20144
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85651/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-01-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20146#discussion_r159584417
  
--- Diff: R/pkg/tests/fulltests/test_mllib_classification.R ---
@@ -348,12 +348,12 @@ test_that("spark.mlp", {
 
   # Test random seed
   # default seed
-  model <- spark.mlp(df, label ~ features, layers = c(4, 5, 4, 3), maxIter 
= 10)
+  model <- spark.mlp(df, label ~ features, layers = c(4, 5, 4, 3), maxIter 
= 100)
--- End diff --

can you check if the run time increases significantly? this is an issue 
before - see SPARK-21693


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20144: [SPARK-21475][CORE][2nd attempt] Change to use NIO's Fil...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20144
  
**[Test build #85651 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85651/testReport)**
 for PR 20144 at commit 
[`277b801`](https://github.com/apache/spark/commit/277b801b3ccee95d8f30c02b1ba8693ab39f302a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20129: [SPARK-22933][SPARKR] R Structured Streaming API ...

2018-01-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20129


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...

2018-01-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20129
  
merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20024
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20024
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85652/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20024: [SPARK-22825][SQL] Fix incorrect results of Casting Arra...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20024
  
**[Test build #85652 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85652/testReport)**
 for PR 20024 at commit 
[`a46a9a7`](https://github.com/apache/spark/commit/a46a9a76487151677d011ded7c06bee9d46c35d1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20137#discussion_r159582683
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,9 +255,26 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+
+>>> import random
+>>> from pyspark.sql.functions import udf
+>>> from pyspark.sql.types import IntegerType, StringType
+>>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
+>>> newRandom_udf = spark.catalog.registerFunction("random_udf", 
random_udf, StringType())
+>>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
+[Row(random_udf()=u'82')]
+>>> spark.range(1).select(newRandom_udf()).collect()  # doctest: 
+SKIP
+[Row(random_udf()=u'62')]
 """
-udf = UserDefinedFunction(f, returnType=returnType, name=name,
-  evalType=PythonEvalType.SQL_BATCHED_UDF)
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
+udf = UserDefinedFunction(f.func, returnType=returnType, 
name=name,
+  
evalType=PythonEvalType.SQL_BATCHED_UDF,
--- End diff --

seems we can support it by just changing 
`evalType=PythonEvalType.SQL_BATCHED_UDF` to `evalType=f.evalType`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20137#discussion_r159582570
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,9 +255,26 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+
+>>> import random
+>>> from pyspark.sql.functions import udf
+>>> from pyspark.sql.types import IntegerType, StringType
+>>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
+>>> newRandom_udf = spark.catalog.registerFunction("random_udf", 
random_udf, StringType())
+>>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
+[Row(random_udf()=u'82')]
+>>> spark.range(1).select(newRandom_udf()).collect()  # doctest: 
+SKIP
+[Row(random_udf()=u'62')]
 """
-udf = UserDefinedFunction(f, returnType=returnType, name=name,
-  evalType=PythonEvalType.SQL_BATCHED_UDF)
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
+udf = UserDefinedFunction(f.func, returnType=returnType, 
name=name,
+  
evalType=PythonEvalType.SQL_BATCHED_UDF,
--- End diff --

cc @ueshin @icexelloss , shall we support register pandas UDF here too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20137
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85657/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20137
  
**[Test build #85657 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85657/testReport)**
 for PR 20137 at commit 
[`2482e6b`](https://github.com/apache/spark/commit/2482e6bcdaf92a78ae6b043a859e10140a273a18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20137
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

2018-01-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20139


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSui...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20147#discussion_r159582192
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -85,6 +90,43 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
 new File(tmpDataDir, name).getCanonicalPath
   }
 
+  private def getFileFromUrl(urlString: String, targetDir: String, 
filename: String): Boolean = {
+val conf = new SparkConf
+val securityManager = new SecurityManager(conf)
+val hadoopConf = new Configuration
+
+val outDir = new File(targetDir)
+if (!outDir.exists()) {
+  outDir.mkdirs()
+}
+
+try {
+  val result = Utils.doFetchFile(urlString, outDir, filename, conf, 
securityManager, hadoopConf)
--- End diff --

`Utils.doFetchFile` is a little overkill for this case, maybe we can just 
implement a simple downloading function with java.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20139
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85648/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20139
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

2018-01-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20139
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20139
  
**[Test build #85648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85648/testReport)**
 for PR 20139 at commit 
[`df75bff`](https://github.com/apache/spark/commit/df75bffa1c27c2b70495a1b2c8988f3a1c979918).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20147
  
**[Test build #85659 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85659/testReport)**
 for PR 20147 at commit 
[`c5e835e`](https://github.com/apache/spark/commit/c5e835eb317d0b6970daef0ffc1214576874e8c6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...

2018-01-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20147
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread VigneshMohan1

Github user VigneshMohan1 commented on the issue:

https://github.com/apache/spark/pull/16578
  
@JoshRosen Can we make this pr to 2.3.0? A lot of people are interested in 
this and this will boost performance in reading parquet nested fields.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

2018-01-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20139
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20141: [SPARK-22945][SQL] add java UDF APIs in the functions ob...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20141
  
**[Test build #85658 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85658/testReport)**
 for PR 20141 at commit 
[`711c027`](https://github.com/apache/spark/commit/711c0278a6d77472761414692c987d256293467e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20147
  
OK to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...

2018-01-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20147
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread Gauravshah

Github user Gauravshah commented on the issue:

https://github.com/apache/spark/pull/16578
  
@marmbrus can we start the review process ? so that it can make it for the 
next release ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSuite shou...

2018-01-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20147
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20147: [SPARK-22940][SQL] HiveExternalCatalogVersionsSui...

2018-01-03 Thread bersprockets

GitHub user bersprockets opened a pull request:

https://github.com/apache/spark/pull/20147

[SPARK-22940][SQL] HiveExternalCatalogVersionsSuite should succeed on 
platforms that don't have wget

## What changes were proposed in this pull request?

Modified HiveExternalCatalogVersionsSuite.scala to use Utils.doFetchFile to 
download different versions of Spark binaries rather than launching wget as an 
external process.

On platforms that don't have wget installed, this suite fails with an error.

@cloud-fan : would you like to check this change?

## How was this patch tested?

1) test-only of HiveExternalCatalogVersionsSuite on several platforms. 
Tested bad mirror, read timeout, and redirects.
2) ./dev/run-tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bersprockets/spark SPARK-22940-alt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20147


commit 5fe679f3271692729e8b9a2a9351725a80fcf8f6
Author: Bruce Robbins 
Date:   2018-01-03T22:32:30Z

Initial commit

commit 92e82d484a30a171141d2cdd1f800254fe33ceaf
Author: Bruce Robbins 
Date:   2018-01-03T23:12:09Z

Remove test log messages

commit c5e835eb317d0b6970daef0ffc1214576874e8c6
Author: Bruce Robbins 
Date:   2018-01-04T02:39:03Z

The 2 new methods should be private




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

2018-01-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20137
  
**[Test build #85657 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85657/testReport)**
 for PR 20137 at commit 
[`2482e6b`](https://github.com/apache/spark/commit/2482e6bcdaf92a78ae6b043a859e10140a273a18).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...

2018-01-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20146#discussion_r159580805
  
--- Diff: R/pkg/tests/fulltests/test_mllib_classification.R ---
@@ -313,7 +313,7 @@ test_that("spark.mlp", {
   # Test predict method
--- End diff --

Actually I think we may remove this test `Test predict method`. Seems to 
me, with the `tol = 0.5`, the prediction may not be very meaningful.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16578
  
> However, I am -1 on merging a change this large after branch cut.

It's disappointing, but I agree we can't merge a change this large into a 
branch cut. It will have to wait for 2.3.1 at the earliest or the next major 
release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20141: [SPARK-22945][SQL] add java UDF APIs in the functions ob...

2018-01-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20141
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20137: [SPARK-22939] [PySpark] Support Spark UDF in registerFun...

2018-01-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20137
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

2018-01-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20137#discussion_r159580038
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName):
 @ignore_unicode_prefix
 @since(2.0)
 def registerFunction(self, name, f, returnType=StringType()):
-"""Registers a python function (including lambda function) as a UDF
+"""Registers a Python function (including lambda function) or a 
wrapped/native UDF
--- End diff --

SGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

2018-01-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20137#discussion_r159579886
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName):
 @ignore_unicode_prefix
 @since(2.0)
 def registerFunction(self, name, f, returnType=StringType()):
-"""Registers a python function (including lambda function) as a UDF
+"""Registers a Python function (including lambda function) or a 
wrapped/native UDF
--- End diff --

BTW, to be honest, I remember I gave several quick tries to get rid of the 
wrapper but keep the docstring correctly at that time but I failed to make a 
good alternative.

Might be good to try if there is a claver way to get rid of the wrapper but 
keep the doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20141: [SPARK-22945][SQL] add java UDF APIs in the funct...

2018-01-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20141#discussion_r159579701
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3461,13 +3485,172 @@ object functions {
 if (nullable) udf else udf.asNonNullable()
   }
 
+  
//
+  // Java UDF functions
+  
//
+
+  /**
+   * Defines a Java UDF0 instance as user-defined function (UDF).
+   * The caller must specify the output data type, and there is no 
automatic input type coercion.
+   * By default the returned UDF is deterministic, to change it to 
nondeterministic, call the
+   * API `UserDefinedFunction.asNondeterministic()`.
+   *
+   * @group udf_funcs
+   * @since 2.3.0
+   */
+  def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction = {
+val func = f.asInstanceOf[UDF0[Any]].call()
+UserDefinedFunction(() => func, returnType, inputTypes = None)
+  }
+
+  /**
+   * Defines a Java UDF1 instance as user-defined function (UDF).
+   * The caller must specify the output data type, and there is no 
automatic input type coercion.
+   * By default the returned UDF is deterministic, to change it to 
nondeterministic, call the
--- End diff --

Nit: `, to` -> `. To`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20137: [SPARK-22939] [PySpark] Support Spark UDF in regi...

2018-01-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20137#discussion_r159579617
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -227,15 +227,15 @@ def dropGlobalTempView(self, viewName):
 @ignore_unicode_prefix
 @since(2.0)
 def registerFunction(self, name, f, returnType=StringType()):
-"""Registers a python function (including lambda function) as a UDF
+"""Registers a Python function (including lambda function) or a 
wrapped/native UDF
--- End diff --

Another idea just in case it helps:

```
Registers a Python function as a UDF or a user defined function.
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 414 matches

Mail list logo