date:20181210

[GitHub] AmplabJenkins removed a comment on issue #23281: [MINOR][DOC]update the condition description of BypassMergeSortShuffl…

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23281: [MINOR][DOC]update the 
condition description of BypassMergeSortShuffl…
URL: https://github.com/apache/spark/pull/23281#issuecomment-446053958
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23281: [MINOR][DOC]update the condition description of BypassMergeSortShuffl…

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23281: [MINOR][DOC]update the condition 
description of BypassMergeSortShuffl…
URL: https://github.com/apache/spark/pull/23281#issuecomment-446053958
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23281: [MINOR][DOC]update the condition description of BypassMergeSortShuffl…

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23281: [MINOR][DOC]update the condition 
description of BypassMergeSortShuffl…
URL: https://github.com/apache/spark/pull/23281#issuecomment-446054026
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] lcqzte10192193 opened a new pull request #23281: [MINOR][DOC]update the condition description of BypassMergeSortShuffl…

2018-12-10 Thread GitBox

lcqzte10192193 opened a new pull request #23281: [MINOR][DOC]update the 
condition description of BypassMergeSortShuffl…
URL: https://github.com/apache/spark/pull/23281
 
 
   …eWriter
   
   ## What changes were proposed in this pull request?
   These three condition description should update, follow #23228  :
   no Ordering is specified,
   no Aggregator is specified, and
   the number of partitions is less than
   spark.shuffle.sort.bypassMergeThreshold.
   
   1、If the shuffle dependency specifies aggregation, but it only aggregates at 
the reduce-side, serialized shuffle can still be used.
   2、If the number of output partitions is 16777216 , we can use serialized 
shuffle.
   
   
   ## How was this patch tested?
   N/A
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move all feature logic to feature classes.

2018-12-10 Thread GitBox

mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move 
all feature logic to feature classes.
URL: https://github.com/apache/spark/pull/23220#discussion_r240456173
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/PodBuilderSuite.scala
 ##
 @@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import java.io.File
+
+import io.fabric8.kubernetes.api.model.{Config => _, _}
+import io.fabric8.kubernetes.client.KubernetesClient
+import io.fabric8.kubernetes.client.dsl.{MixedOperation, PodResource}
+import org.mockito.Matchers.any
+import org.mockito.Mockito.{mock, never, verify, when}
+import scala.collection.JavaConverters._
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.internal.config.ConfigEntry
+
+abstract class PodBuilderSuite extends SparkFunSuite {
+
+  protected def templateFileConf: ConfigEntry[_]
+
+  protected def buildPod(sparkConf: SparkConf, client: KubernetesClient): 
SparkPod
+
+  private val baseConf = new SparkConf(false)
+.set(Config.CONTAINER_IMAGE, "spark-executor:latest")
+
+  test("use empty initial pod if template is not specified") {
+val client = mock(classOf[KubernetesClient])
+buildPod(baseConf.clone(), client)
+verify(client, never()).pods()
+  }
+
+  test("load pod template if specified") {
+val client = mockKubernetesClient()
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val pod = buildPod(sparkConf, client)
+verifyPod(pod)
+  }
+
+  test("complain about misconfigured pod template") {
+val client = mockKubernetesClient(
+  new PodBuilder()
+.withNewMetadata()
+.addToLabels("test-label-key", "test-label-value")
+.endMetadata()
+.build())
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val exception = intercept[SparkException] {
+  buildPod(sparkConf, client)
+}
+assert(exception.getMessage.contains("Could not load pod from template 
file."))
+  }
+
+  private def mockKubernetesClient(pod: Pod = podWithSupportedFeatures()): 
KubernetesClient = {
+val kubernetesClient = mock(classOf[KubernetesClient])
+val pods =
+  mock(classOf[MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, 
DoneablePod]]])
+val podResource = mock(classOf[PodResource[Pod, DoneablePod]])
+when(kubernetesClient.pods()).thenReturn(pods)
+when(pods.load(any(classOf[File]))).thenReturn(podResource)
+when(podResource.get()).thenReturn(pod)
+kubernetesClient
+  }
+
+  private def verifyPod(pod: SparkPod): Unit = {
 
 Review comment:
   Another factor of my concern about is that for each individual assertion, it 
is unclear which step the assertion is tied to. This reads a lot more like an 
ETE test than a unit test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move all feature logic to feature classes.

2018-12-10 Thread GitBox

mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move 
all feature logic to feature classes.
URL: https://github.com/apache/spark/pull/23220#discussion_r240454265
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/PodBuilderSuite.scala
 ##
 @@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import java.io.File
+
+import io.fabric8.kubernetes.api.model.{Config => _, _}
+import io.fabric8.kubernetes.client.KubernetesClient
+import io.fabric8.kubernetes.client.dsl.{MixedOperation, PodResource}
+import org.mockito.Matchers.any
+import org.mockito.Mockito.{mock, never, verify, when}
+import scala.collection.JavaConverters._
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.internal.config.ConfigEntry
+
+abstract class PodBuilderSuite extends SparkFunSuite {
+
+  protected def templateFileConf: ConfigEntry[_]
+
+  protected def buildPod(sparkConf: SparkConf, client: KubernetesClient): 
SparkPod
+
+  private val baseConf = new SparkConf(false)
+.set(Config.CONTAINER_IMAGE, "spark-executor:latest")
+
+  test("use empty initial pod if template is not specified") {
+val client = mock(classOf[KubernetesClient])
+buildPod(baseConf.clone(), client)
+verify(client, never()).pods()
+  }
+
+  test("load pod template if specified") {
+val client = mockKubernetesClient()
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val pod = buildPod(sparkConf, client)
+verifyPod(pod)
+  }
+
+  test("complain about misconfigured pod template") {
+val client = mockKubernetesClient(
+  new PodBuilder()
+.withNewMetadata()
+.addToLabels("test-label-key", "test-label-value")
+.endMetadata()
+.build())
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val exception = intercept[SparkException] {
+  buildPod(sparkConf, client)
+}
+assert(exception.getMessage.contains("Could not load pod from template 
file."))
+  }
+
+  private def mockKubernetesClient(pod: Pod = podWithSupportedFeatures()): 
KubernetesClient = {
+val kubernetesClient = mock(classOf[KubernetesClient])
+val pods =
+  mock(classOf[MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, 
DoneablePod]]])
+val podResource = mock(classOf[PodResource[Pod, DoneablePod]])
+when(kubernetesClient.pods()).thenReturn(pods)
+when(pods.load(any(classOf[File]))).thenReturn(podResource)
+when(podResource.get()).thenReturn(pod)
+kubernetesClient
+  }
+
+  private def verifyPod(pod: SparkPod): Unit = {
+val metadata = pod.pod.getMetadata
+assert(metadata.getLabels.containsKey("test-label-key"))
+assert(metadata.getAnnotations.containsKey("test-annotation-key"))
+assert(metadata.getNamespace === "namespace")
+assert(metadata.getOwnerReferences.asScala.exists(_.getName == 
"owner-reference"))
+val spec = pod.pod.getSpec
+assert(!spec.getContainers.asScala.exists(_.getName == 
"executor-container"))
+assert(spec.getDnsPolicy === "dns-policy")
+assert(spec.getHostAliases.asScala.exists(_.getHostnames.asScala.exists(_ 
== "hostname")))
 
 Review comment:
   Nit: The second call to `exists` can be `contains` instead, so that we don't 
pass a function object that ignores the argument. Alternatively, both `exists` 
calls can be removed:
   
   ```
   spec.getHostAliases.asScala.flatMap(_.getHostnames).contains("hostname")
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move all feature logic to feature classes.

2018-12-10 Thread GitBox

mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move 
all feature logic to feature classes.
URL: https://github.com/apache/spark/pull/23220#discussion_r240454417
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/PodBuilderSuite.scala
 ##
 @@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import java.io.File
+
+import io.fabric8.kubernetes.api.model.{Config => _, _}
+import io.fabric8.kubernetes.client.KubernetesClient
+import io.fabric8.kubernetes.client.dsl.{MixedOperation, PodResource}
+import org.mockito.Matchers.any
+import org.mockito.Mockito.{mock, never, verify, when}
+import scala.collection.JavaConverters._
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.internal.config.ConfigEntry
+
+abstract class PodBuilderSuite extends SparkFunSuite {
+
+  protected def templateFileConf: ConfigEntry[_]
+
+  protected def buildPod(sparkConf: SparkConf, client: KubernetesClient): 
SparkPod
+
+  private val baseConf = new SparkConf(false)
+.set(Config.CONTAINER_IMAGE, "spark-executor:latest")
+
+  test("use empty initial pod if template is not specified") {
+val client = mock(classOf[KubernetesClient])
+buildPod(baseConf.clone(), client)
+verify(client, never()).pods()
+  }
+
+  test("load pod template if specified") {
+val client = mockKubernetesClient()
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val pod = buildPod(sparkConf, client)
+verifyPod(pod)
+  }
+
+  test("complain about misconfigured pod template") {
+val client = mockKubernetesClient(
+  new PodBuilder()
+.withNewMetadata()
+.addToLabels("test-label-key", "test-label-value")
+.endMetadata()
+.build())
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val exception = intercept[SparkException] {
+  buildPod(sparkConf, client)
+}
+assert(exception.getMessage.contains("Could not load pod from template 
file."))
+  }
+
+  private def mockKubernetesClient(pod: Pod = podWithSupportedFeatures()): 
KubernetesClient = {
+val kubernetesClient = mock(classOf[KubernetesClient])
+val pods =
+  mock(classOf[MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, 
DoneablePod]]])
+val podResource = mock(classOf[PodResource[Pod, DoneablePod]])
+when(kubernetesClient.pods()).thenReturn(pods)
+when(pods.load(any(classOf[File]))).thenReturn(podResource)
+when(podResource.get()).thenReturn(pod)
+kubernetesClient
+  }
+
+  private def verifyPod(pod: SparkPod): Unit = {
+val metadata = pod.pod.getMetadata
+assert(metadata.getLabels.containsKey("test-label-key"))
+assert(metadata.getAnnotations.containsKey("test-annotation-key"))
+assert(metadata.getNamespace === "namespace")
+assert(metadata.getOwnerReferences.asScala.exists(_.getName == 
"owner-reference"))
+val spec = pod.pod.getSpec
+assert(!spec.getContainers.asScala.exists(_.getName == 
"executor-container"))
 
 Review comment:
   Nit: Use `.asScala.map(_.getName).contains("executor-container")`. Similar 
changes come up throughout this test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move all feature logic to feature classes.

2018-12-10 Thread GitBox

mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move 
all feature logic to feature classes.
URL: https://github.com/apache/spark/pull/23220#discussion_r240454863
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/PodBuilderSuite.scala
 ##
 @@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import java.io.File
+
+import io.fabric8.kubernetes.api.model.{Config => _, _}
+import io.fabric8.kubernetes.client.KubernetesClient
+import io.fabric8.kubernetes.client.dsl.{MixedOperation, PodResource}
+import org.mockito.Matchers.any
+import org.mockito.Mockito.{mock, never, verify, when}
+import scala.collection.JavaConverters._
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.internal.config.ConfigEntry
+
+abstract class PodBuilderSuite extends SparkFunSuite {
+
+  protected def templateFileConf: ConfigEntry[_]
+
+  protected def buildPod(sparkConf: SparkConf, client: KubernetesClient): 
SparkPod
+
+  private val baseConf = new SparkConf(false)
+.set(Config.CONTAINER_IMAGE, "spark-executor:latest")
+
+  test("use empty initial pod if template is not specified") {
+val client = mock(classOf[KubernetesClient])
+buildPod(baseConf.clone(), client)
+verify(client, never()).pods()
+  }
+
+  test("load pod template if specified") {
+val client = mockKubernetesClient()
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val pod = buildPod(sparkConf, client)
+verifyPod(pod)
+  }
+
+  test("complain about misconfigured pod template") {
+val client = mockKubernetesClient(
+  new PodBuilder()
+.withNewMetadata()
+.addToLabels("test-label-key", "test-label-value")
+.endMetadata()
+.build())
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val exception = intercept[SparkException] {
+  buildPod(sparkConf, client)
+}
+assert(exception.getMessage.contains("Could not load pod from template 
file."))
+  }
+
+  private def mockKubernetesClient(pod: Pod = podWithSupportedFeatures()): 
KubernetesClient = {
+val kubernetesClient = mock(classOf[KubernetesClient])
+val pods =
+  mock(classOf[MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, 
DoneablePod]]])
+val podResource = mock(classOf[PodResource[Pod, DoneablePod]])
+when(kubernetesClient.pods()).thenReturn(pods)
+when(pods.load(any(classOf[File]))).thenReturn(podResource)
+when(podResource.get()).thenReturn(pod)
+kubernetesClient
+  }
+
+  private def verifyPod(pod: SparkPod): Unit = {
 
 Review comment:
   The number of things this test checks is remarkable, and it is very much 
possible to accidentally omit checking the application of a specific feature 
when a new one is added for either the driver or executor. This is why we had 
the overridable feature steps in the original incarnation of these tests. Not 
mocking the substeps leads us to need to check that some specific aspect of 
each step has been applied. Can we go back to mocking the different steps so 
that this test can be more easily modified when we add more features? Or else 
can we abstract away the idea that these steps are applied without this test 
itself needing to know what the step itself actually does?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move all feature logic to feature classes.

2018-12-10 Thread GitBox

mccheah commented on a change in pull request #23220: [SPARK-25877][k8s] Move 
all feature logic to feature classes.
URL: https://github.com/apache/spark/pull/23220#discussion_r240454265
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/PodBuilderSuite.scala
 ##
 @@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s
+
+import java.io.File
+
+import io.fabric8.kubernetes.api.model.{Config => _, _}
+import io.fabric8.kubernetes.client.KubernetesClient
+import io.fabric8.kubernetes.client.dsl.{MixedOperation, PodResource}
+import org.mockito.Matchers.any
+import org.mockito.Mockito.{mock, never, verify, when}
+import scala.collection.JavaConverters._
+
+import org.apache.spark.{SparkConf, SparkException, SparkFunSuite}
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.internal.config.ConfigEntry
+
+abstract class PodBuilderSuite extends SparkFunSuite {
+
+  protected def templateFileConf: ConfigEntry[_]
+
+  protected def buildPod(sparkConf: SparkConf, client: KubernetesClient): 
SparkPod
+
+  private val baseConf = new SparkConf(false)
+.set(Config.CONTAINER_IMAGE, "spark-executor:latest")
+
+  test("use empty initial pod if template is not specified") {
+val client = mock(classOf[KubernetesClient])
+buildPod(baseConf.clone(), client)
+verify(client, never()).pods()
+  }
+
+  test("load pod template if specified") {
+val client = mockKubernetesClient()
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val pod = buildPod(sparkConf, client)
+verifyPod(pod)
+  }
+
+  test("complain about misconfigured pod template") {
+val client = mockKubernetesClient(
+  new PodBuilder()
+.withNewMetadata()
+.addToLabels("test-label-key", "test-label-value")
+.endMetadata()
+.build())
+val sparkConf = baseConf.clone().set(templateFileConf.key, 
"template-file.yaml")
+val exception = intercept[SparkException] {
+  buildPod(sparkConf, client)
+}
+assert(exception.getMessage.contains("Could not load pod from template 
file."))
+  }
+
+  private def mockKubernetesClient(pod: Pod = podWithSupportedFeatures()): 
KubernetesClient = {
+val kubernetesClient = mock(classOf[KubernetesClient])
+val pods =
+  mock(classOf[MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, 
DoneablePod]]])
+val podResource = mock(classOf[PodResource[Pod, DoneablePod]])
+when(kubernetesClient.pods()).thenReturn(pods)
+when(pods.load(any(classOf[File]))).thenReturn(podResource)
+when(podResource.get()).thenReturn(pod)
+kubernetesClient
+  }
+
+  private def verifyPod(pod: SparkPod): Unit = {
+val metadata = pod.pod.getMetadata
+assert(metadata.getLabels.containsKey("test-label-key"))
+assert(metadata.getAnnotations.containsKey("test-annotation-key"))
+assert(metadata.getNamespace === "namespace")
+assert(metadata.getOwnerReferences.asScala.exists(_.getName == 
"owner-reference"))
+val spec = pod.pod.getSpec
+assert(!spec.getContainers.asScala.exists(_.getName == 
"executor-container"))
+assert(spec.getDnsPolicy === "dns-policy")
+assert(spec.getHostAliases.asScala.exists(_.getHostnames.asScala.exists(_ 
== "hostname")))
 
 Review comment:
   The second call to `exists` can be `contains` instead, so that we don't pass 
a function object that ignores the argument. Alternatively, both `exists` calls 
can be removed:
   
   ```
   spec.getHostAliases.asScala.flatMap(_.getHostnames).contains("hostname")
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23280: [MINOR][SQL] Some errors in the notes.

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23280: [MINOR][SQL] Some errors in 
the notes.
URL: https://github.com/apache/spark/pull/23280#issuecomment-446051121
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23280: [MINOR][SQL] Some errors in the notes.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23280: [MINOR][SQL] Some errors in the notes.
URL: https://github.com/apache/spark/pull/23280#issuecomment-446051398
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23280: [MINOR][SQL] Some errors in the notes.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23280: [MINOR][SQL] Some errors in the notes.
URL: https://github.com/apache/spark/pull/23280#issuecomment-446051121
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23280: [MINOR][SQL] Some errors in the notes.

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23280: [MINOR][SQL] Some errors in 
the notes.
URL: https://github.com/apache/spark/pull/23280#issuecomment-446051038
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23280: [MINOR][SQL] Some errors in the notes.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23280: [MINOR][SQL] Some errors in the notes.
URL: https://github.com/apache/spark/pull/23280#issuecomment-446051038
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox

HyukjinKwon commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240455039
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
+env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
+env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+  }
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  val LOG_URL_PATTERN_HTTP_SCHEME = "{{HttpScheme}}"
+  val LOG_URL_PATTERN_NODE_HTTP_ADDRESS = "{{NodeHttpAddress}}"
+  val LOG_URL_PATTERN_CLUSTER_ID = "{{ClusterId}}"
+  val LOG_URL_PATTERN_CONTAINER_ID = "{{ContainerId}}"
+  val LOG_URL_PATTERN_USER = "{{User}}"
+  val LOG_URL_PATTERN_FILE_NAME = "{{FileName}}"
+
+  def buildPathVariables(httpScheme: String, nodeHttpAddress: String, 
clusterId: String,
 
 Review comment:
   Strictly, the guide line itself does not explicitly mention about two lines 
- it says it's okay if it fits within 2 lines (see 
https://github.com/databricks/scala-style-guide/pull/64#issuecomment-344609090).
 I intentionally avoided this because some of codes in some components do not 
comply this.
   
   However, strictly we should better stick to two spaces indentation whenever 
possible per https://github.com/databricks/scala-style-guide#indent
   
   > Use 2-space indentation in general.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] CarolinePeng opened a new pull request #23280: [MINOR][SQL] Some errors in the notes.

2018-12-10 Thread GitBox

CarolinePeng opened a new pull request #23280: [MINOR][SQL] Some errors in the 
notes.
URL: https://github.com/apache/spark/pull/23280
 
 
   ## What changes were proposed in this pull request?
   
   When using ordinals to access linked list, the time cost is O(n).
   
   ## How was this patch tested?
   
   Existing tests.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox

HyukjinKwon commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240455039
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
+env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
+env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+  }
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  val LOG_URL_PATTERN_HTTP_SCHEME = "{{HttpScheme}}"
+  val LOG_URL_PATTERN_NODE_HTTP_ADDRESS = "{{NodeHttpAddress}}"
+  val LOG_URL_PATTERN_CLUSTER_ID = "{{ClusterId}}"
+  val LOG_URL_PATTERN_CONTAINER_ID = "{{ContainerId}}"
+  val LOG_URL_PATTERN_USER = "{{User}}"
+  val LOG_URL_PATTERN_FILE_NAME = "{{FileName}}"
+
+  def buildPathVariables(httpScheme: String, nodeHttpAddress: String, 
clusterId: String,
 
 Review comment:
   Strictly, the guide line itself does not explicitly mention about two lines 
- it says it's okay if it fits within 2 lines (see 
https://github.com/databricks/scala-style-guide/pull/64#issuecomment-344609090).
 I intentionally avoided this because some of codes in some components do not 
comply this.
   
   However, strictly we should better stick to two line indentation whenever 
possible per https://github.com/databricks/scala-style-guide#indent
   
   > Use 2-space indentation in general.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE

2018-12-10 Thread GitBox

SparkQA commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on 
mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE
URL: https://github.com/apache/spark/pull/23213#issuecomment-446048530
 
 
   **[Test build #99942 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99942/testReport)**
 for PR 23213 at commit 
[`a9c108f`](https://github.com/apache/spark/commit/a9c108fa090b847d48848cf6d679aa6747dcc534).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23213: [SPARK-26262][SQL] Runs 
SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and 
CODEGEN_FACTORY_MODE
URL: https://github.com/apache/spark/pull/23213#issuecomment-446048394
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5947/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23213: [SPARK-26262][SQL] Runs 
SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and 
CODEGEN_FACTORY_MODE
URL: https://github.com/apache/spark/pull/23213#issuecomment-446048394
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5947/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23272: [SPARK-26265][Core] Fix 
deadlock in BytesToBytesMap.MapIterator when locking both 
BytesToBytesMap.MapIterator and TaskMemoryManager
URL: https://github.com/apache/spark/pull/23272#discussion_r240453178
 
 

 ##
 File path: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
 ##
 @@ -283,6 +290,9 @@ private void advanceToNextPage() {
   }
 }
   }
+  if (pageToFree != null) {
+freePage(pageToFree);
 
 Review comment:
   is it possible that this page is already freed by another consumer?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23213: [SPARK-26262][SQL] Runs 
SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and 
CODEGEN_FACTORY_MODE
URL: https://github.com/apache/spark/pull/23213#issuecomment-446048389
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23213: [SPARK-26262][SQL] Runs 
SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and 
CODEGEN_FACTORY_MODE
URL: https://github.com/apache/spark/pull/23213#issuecomment-446048389
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23252: [SPARK-26239] File-based 
secret key loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446047256
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE

2018-12-10 Thread GitBox

cloud-fan commented on issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite 
on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE
URL: https://github.com/apache/spark/pull/23213#issuecomment-446047462
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23252: [SPARK-26239] File-based secret key 
loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446047258
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99935/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23252: [SPARK-26239] File-based 
secret key loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446047258
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99935/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23252: [SPARK-26239] File-based secret key 
loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446047256
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA removed a comment on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

SparkQA removed a comment on issue #23252: [SPARK-26239] File-based secret key 
loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-445985434
 
 
   **[Test build #99935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99935/testReport)**
 for PR 23252 at commit 
[`eade6e2`](https://github.com/apache/spark/commit/eade6e2db84ebab2ede5a375a1a2c1303fefc1fe).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

SparkQA commented on issue #23252: [SPARK-26239] File-based secret key loading 
for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446046903
 
 
   **[Test build #99935 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99935/testReport)**
 for PR 23252 at commit 
[`eade6e2`](https://github.com/apache/spark/commit/eade6e2db84ebab2ede5a375a1a2c1303fefc1fe).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23275: [SPARK-26323][SQL] 
Scala UDF should still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#discussion_r240451835
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ##
 @@ -88,68 +88,49 @@ sealed trait UserDefinedFunction {
 private[sql] case class SparkUserDefinedFunction(
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]],
-nullableTypes: Option[Seq[Boolean]],
+inputSchemas: Seq[Option[ScalaReflection.Schema]],
 name: Option[String] = None,
 nullable: Boolean = true,
 deterministic: Boolean = true) extends UserDefinedFunction {
 
   @scala.annotation.varargs
-  override def apply(exprs: Column*): Column = {
-// TODO: make sure this class is only instantiated through 
`SparkUserDefinedFunction.create()`
-// and `nullableTypes` is always set.
-if (inputTypes.isDefined) {
-  assert(inputTypes.get.length == nullableTypes.get.length)
-}
-
-val inputsNullSafe = nullableTypes.getOrElse {
-  ScalaReflection.getParameterTypeNullability(f)
-}
+  override def apply(cols: Column*): Column = {
+Column(createScalaUDF(cols.map(_.expr)))
+  }
 
-Column(ScalaUDF(
+  private[sql] def createScalaUDF(exprs: Seq[Expression]): ScalaUDF = {
+// It's possible that some of the inputs don't have a specific type(e.g. 
`Any`),  skip type
+// check and null check for them.
+val inputTypes = inputSchemas.map(_.map(_.dataType).getOrElse(AnyDataType))
+val inputsNullSafe = inputSchemas.map(_.map(_.nullable).getOrElse(true))
 
 Review comment:
   Here `getOrElse` maybe better, as it matches the previous line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] yaooqinn closed pull request #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on executor side

2018-12-10 Thread GitBox

yaooqinn closed pull request #19840: [SPARK-22640][PYSPARK][YARN]switch python 
exec on executor side
URL: https://github.com/apache/spark/pull/19840
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
index f524de68fbce0..9d16dbbe9a0c8 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
@@ -57,9 +57,14 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
 
   require(funcs.length == argOffsets.length, "argOffsets should have the same 
length as funcs")
 
+  private val env = SparkEnv.get
+  private val conf = env.conf
+
   // All the Python functions should have the same exec, version and envvars.
   protected val envVars = funcs.head.funcs.head.envVars
-  protected val pythonExec = funcs.head.funcs.head.pythonExec
+  protected val pythonExec = 
conf.getOption("spark.executorEnv.PYSPARK_DRIVER_PYTHON")
+.getOrElse(conf.getOption("spark.executorEnv.PYSPARK_PYTHON")
+  .getOrElse(funcs.head.funcs.head.pythonExec))
   protected val pythonVer = funcs.head.funcs.head.pythonVer
 
   // TODO: support accumulator in multiple UDF
@@ -70,7 +75,6 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
   partitionIndex: Int,
   context: TaskContext): Iterator[OUT] = {
 val startTime = System.currentTimeMillis
-val env = SparkEnv.get
 val localdir = env.blockManager.diskBlockManager.localDirs.map(f => 
f.getPath()).mkString(",")
 envVars.put("SPARK_LOCAL_DIRS", localdir) // it's also used in monitor 
thread
 if (reuseWorker) {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23275: [SPARK-26323][SQL] 
Scala UDF should still check input types even if some inputs are of type Any
URL: https://github.com/apache/spark/pull/23275#discussion_r240449840
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ##
 @@ -47,25 +47,13 @@ case class ScalaUDF(
 function: AnyRef,
 dataType: DataType,
 children: Seq[Expression],
-inputsNullSafe: Seq[Boolean],
-inputTypes: Seq[DataType] = Nil,
+@transient inputsNullSafe: Seq[Boolean],
 
 Review comment:
   expressions are usually serialized to executor side. Previously it's fine, 
as all data types are case class, which is serializable. But `AnyDataType` is 
normal scala object, which is not serializable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] asfgit closed pull request #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance

2018-12-10 Thread GitBox

asfgit closed pull request #23262: [SPARK-26312][SQL]Replace 
RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
URL: https://github.com/apache/spark/pull/23262
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala
index e214bfd050410..49fb288fdea6a 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala
@@ -18,54 +18,14 @@
 package org.apache.spark.sql.execution
 
 import org.apache.spark.rdd.RDD
-import org.apache.spark.sql.{Encoder, Row, SparkSession}
-import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.{Encoder, SparkSession}
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
UnknownPartitioning}
 import org.apache.spark.sql.catalyst.util.truncatedString
 import org.apache.spark.sql.execution.metric.SQLMetrics
-import org.apache.spark.sql.types.DataType
-
-object RDDConversions {
-  def productToRowRdd[A <: Product](data: RDD[A], outputTypes: Seq[DataType]): 
RDD[InternalRow] = {
-data.mapPartitions { iterator =>
-  val numColumns = outputTypes.length
-  val mutableRow = new GenericInternalRow(numColumns)
-  val converters = 
outputTypes.map(CatalystTypeConverters.createToCatalystConverter)
-  iterator.map { r =>
-var i = 0
-while (i < numColumns) {
-  mutableRow(i) = converters(i)(r.productElement(i))
-  i += 1
-}
-
-mutableRow
-  }
-}
-  }
-
-  /**
-   * Convert the objects inside Row into the types Catalyst expected.
-   */
-  def rowToRowRdd(data: RDD[Row], outputTypes: Seq[DataType]): 
RDD[InternalRow] = {
-data.mapPartitions { iterator =>
-  val numColumns = outputTypes.length
-  val mutableRow = new GenericInternalRow(numColumns)
-  val converters = 
outputTypes.map(CatalystTypeConverters.createToCatalystConverter)
-  iterator.map { r =>
-var i = 0
-while (i < numColumns) {
-  mutableRow(i) = converters(i)(r(i))
-  i += 1
-}
-
-mutableRow
-  }
-}
-  }
-}
 
 object ExternalRDD {
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index c6000442fae76..b304e2da6e1cf 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -29,11 +29,11 @@ import 
org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow, Quali
 import org.apache.spark.sql.catalyst.CatalystTypeConverters.convertToScala
 import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.encoders.RowEncoder
 import org.apache.spark.sql.catalyst.expressions
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.planning.PhysicalOperation
 import org.apache.spark.sql.catalyst.plans.logical.{InsertIntoDir, 
InsertIntoTable, LogicalPlan, Project}
-import org.apache.spark.sql.catalyst.plans.physical.HashPartitioning
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution.{RowDataSourceScanExec, SparkPlan}
 import org.apache.spark.sql.execution.command._
@@ -416,7 +416,10 @@ case class DataSourceStrategy(conf: SQLConf) extends 
Strategy with Logging with
   output: Seq[Attribute],
   rdd: RDD[Row]): RDD[InternalRow] = {
 if (relation.relation.needConversion) {
-  execution.RDDConversions.rowToRowRdd(rdd, output.map(_.dataType))
+  val converters = RowEncoder(StructType.fromAttributes(output))
+  rdd.mapPartitions { iterator =>
+iterator.map(converters.toRow)
+  }
 } else {
   rdd.asInstanceOf[RDD[InternalRow]]
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] HyukjinKwon commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance

2018-12-10 Thread GitBox

HyukjinKwon commented on issue #23262: [SPARK-26312][SQL]Replace 
RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
URL: https://github.com/apache/spark/pull/23262#issuecomment-446042871
 
 
   Weird. My remote is as below FWIW.
   
   ```
   $ git remote -v
   apache   https://gitbox.apache.org/repos/asf/spark.git (fetch)
   apache   https://gitbox.apache.org/repos/asf/spark.git (push)
   ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicates and RewritePredicateSubquery after OptimizeSubqueries

2018-12-10 Thread GitBox

cloud-fan commented on issue #23211: [SPARK-19712][SQL] Move 
PullupCorrelatedPredicates and RewritePredicateSubquery after OptimizeSubqueries
URL: https://github.com/apache/spark/pull/23211#issuecomment-446042787
 
 
   Yes, since `PushdownLeftSemiOrAntiJoin` rule is useful without subquery.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance

2018-12-10 Thread GitBox

HyukjinKwon commented on issue #23262: [SPARK-26312][SQL]Replace 
RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
URL: https://github.com/apache/spark/pull/23262#issuecomment-446042637
 
 
   Oops, looks working fine after switching to gitbox to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23208: [SPARK-25530][SQL] data 
source v2 API refactor (batch write)
URL: https://github.com/apache/spark/pull/23208#discussion_r240448233
 
 

 ##
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java
 ##
 @@ -25,7 +25,10 @@
  * The base interface for v2 data sources which don't have a real catalog. 
Implementations must
  * have a public, 0-arg constructor.
  * 
- * The major responsibility of this interface is to return a {@link Table} for 
read/write.
+ * The major responsibility of this interface is to return a {@link Table} for 
read/write. If you
+ * want to allow end-users to write data to non-existing tables via write APIs 
in `DataFrameWriter`
+ * with `SaveMode`, you must return a {@link Table} instance even if the table 
doesn't exist. The
+ * table schema can be empty in this case.
 
 Review comment:
   what about the file source behavior difference between `SaveMode.Append` and 
the new append operator? Are you saying we should accept it and ask users to 
change their code? file source is widely used with `df.write.save` API...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance

2018-12-10 Thread GitBox

cloud-fan commented on issue #23262: [SPARK-26312][SQL]Replace 
RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
URL: https://github.com/apache/spark/pull/23262#issuecomment-446041476
 
 
   yea it didn't work. The PR is not merged. I'll try it later. cc @srowen do 
you hit the same issue? I already switched to gitbox


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23266: [SPARK-26313][SQL] move 
read related methods from Table to read related mix-in traits
URL: https://github.com/apache/spark/pull/23266#discussion_r240447269
 
 

 ##
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java
 ##
 @@ -20,14 +20,27 @@
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.sources.v2.reader.Scan;
 import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
 
 /**
- * An empty mix-in interface for {@link Table}, to indicate this table 
supports batch scan.
- * 
- * If a {@link Table} implements this interface, its {@link 
Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link 
Scan#toBatch()}
- * implemented.
- * 
+ * A mix-in interface for {@link Table} to provide data reading ability of 
batch processing.
  */
 @Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
 
 Review comment:
   or we treat it as an invalid data source, and ask users to use `df.foreach` 
instead?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23266: [SPARK-26313][SQL] move 
read related methods from Table to read related mix-in traits
URL: https://github.com/apache/spark/pull/23266#discussion_r240447132
 
 

 ##
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java
 ##
 @@ -20,14 +20,27 @@
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.sources.v2.reader.Scan;
 import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
 
 /**
- * An empty mix-in interface for {@link Table}, to indicate this table 
supports batch scan.
- * 
- * If a {@link Table} implements this interface, its {@link 
Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link 
Scan#toBatch()}
- * implemented.
- * 
+ * A mix-in interface for {@link Table} to provide data reading ability of 
batch processing.
  */
 @Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
 
 Review comment:
   @rdblue what about the use case in 
http://apache-spark-developers-list.1001551.n3.nabble.com/Possible-bug-in-DatasourceV2-td25343.html
 ?
   
   i.e. the data source can write data of any schema, and it's not readable, it 
has no schema?
   
   Do you mean we should ask data source to return an empty schema for this 
case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox

HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240446659
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
+env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
+env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+  }
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  val LOG_URL_PATTERN_HTTP_SCHEME = "{{HttpScheme}}"
 
 Review comment:
   Ah OK. I'm in favor of avoiding to use string constant directly, but not 
strong opinion on it. Will address.
   
   And yes I can put them in a single method, but placing a new method into 
class will bring unnecessary burden to the test code, since ExecutorRunnable 
receives lots of parameters to be instantiated.
   
   If we want to add an end-to-end test (instantiating YARN cluster and running 
executors) we still need to instantiate ExecutorRunnable (I think we are 
already covering it from here [1]), but if we just want to make sure the logic 
works properly, we might want to keep this as new object and add a test against 
the object to avoid instantiating ExecutorRunnable. WDYT?
   
   1. 
https://github.com/apache/spark/blob/05cf81e6de3d61ddb0af81cd179665693f23351f/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala#L442-L461


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox

HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240446659
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
+env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
+env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+  }
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  val LOG_URL_PATTERN_HTTP_SCHEME = "{{HttpScheme}}"
 
 Review comment:
   Ah OK. I'm in favor of avoiding to use string constant directly, but not 
strong opinion on it. Will address.
   
   And yes I can put them in a single method, but placing a new method into 
class will bring unnecessary burden to the test code, since ExecutorRunnable 
receives lots of parameters to be instantiated.
   
   If we want to add an end-to-end test (instantiating YARN cluster and running 
executors) we still need to instantiate ExecutorRunnable (I think we are 
already covering it from here [1]), but if we just want to make sure the logic 
works properly, we might want to keep this as new object and add a test against 
the object to avoid instantiating ExecutorRunnable. WDYT?
   
   1. 
https://github.com/apache/spark/blob/05cf81e6de3d61ddb0af81cd179665693f23351f/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala#L460


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance

2018-12-10 Thread GitBox

HyukjinKwon commented on issue #23262: [SPARK-26312][SQL]Replace 
RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
URL: https://github.com/apache/spark/pull/23262#issuecomment-446040157
 
 
   It doesn't work right now .. right? haha. I tried to merge but failed. Some 
how the apache spark repo looks down (?).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23271: [SPARK-26318][SQL] Enhance function merge performance in Row

2018-12-10 Thread GitBox

cloud-fan commented on issue #23271: [SPARK-26318][SQL] Enhance function merge 
performance in Row
URL: https://github.com/apache/spark/pull/23271#issuecomment-446040156
 
 
   I have the same question. If it's not used by anyone, maybe we can deprecate 
it and remove it in the next release


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance

2018-12-10 Thread GitBox

cloud-fan commented on issue #23262: [SPARK-26312][SQL]Replace 
RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
URL: https://github.com/apache/spark/pull/23262#issuecomment-446038519
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23252: [SPARK-26239] File-based 
secret key loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446038018
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99934/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox

HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240444758
 
 

 ##
 File path: docs/running-on-yarn.md
 ##
 @@ -430,6 +430,21 @@ To use a custom metrics.properties for the application 
master and executors, upd
   See spark.yarn.config.gatewayPath.
   
 
+
+  spark.yarn.custom.log.url
+  (none)
+  
+  Specifies custom spark log url for supporting external log service rather 
than NodeManager webapp address.
+  Spark will support some path variables via patterns. Supported patterns and 
allocated values are below: 
+  
+  * `{{HttpScheme}}`: `http`/`https` according to YARN HTTP policy. 
(Configured via `yarn.http.policy`)
+  * `{{NodeHttpAddress}}`: HTTP URI of the node on Container. 
 
 Review comment:
   My bad. It is `host:port` instead of URI which can be retrieved from 
`container.getNodeHttpAddress`. The representation of `node on container` is 
borrowed from javadoc of this method, but I'm OK to use anything more clarified.
   
   Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23252: [SPARK-26239] File-based 
secret key loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446038016
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23252: [SPARK-26239] File-based secret key 
loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446038018
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99934/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23252: [SPARK-26239] File-based secret key 
loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446038016
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA removed a comment on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

SparkQA removed a comment on issue #23252: [SPARK-26239] File-based secret key 
loading for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-445983502
 
 
   **[Test build #99934 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99934/testReport)**
 for PR 23252 at commit 
[`9ccff66`](https://github.com/apache/spark/commit/9ccff66b14946f119a870416176b7614f28b37c1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-10 Thread GitBox

SparkQA commented on issue #23252: [SPARK-26239] File-based secret key loading 
for SASL.
URL: https://github.com/apache/spark/pull/23252#issuecomment-446037823
 
 
   **[Test build #99934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99934/testReport)**
 for PR 23252 at commit 
[`9ccff66`](https://github.com/apache/spark/commit/9ccff66b14946f119a870416176b7614f28b37c1).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox

HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240444139
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
 Review comment:
   Yes it will remove the branch. Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23249: [SPARK-26297][SQL] improve the 
doc of Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#issuecomment-446037380
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5946/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning

2018-12-10 Thread GitBox

SparkQA commented on issue #23249: [SPARK-26297][SQL] improve the doc of 
Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#issuecomment-446037548
 
 
   **[Test build #99941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99941/testReport)**
 for PR 23249 at commit 
[`ddb82c3`](https://github.com/apache/spark/commit/ddb82c3c8822e647790b1a303f647f0bf6dc1c9d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23249: [SPARK-26297][SQL] improve the 
doc of Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#issuecomment-446037378
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23249: [SPARK-26297][SQL] improve the doc of 
Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#issuecomment-446037380
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5946/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23249: [SPARK-26297][SQL] improve the doc of 
Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#issuecomment-446037378
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23249: [SPARK-26297][SQL] 
improve the doc of Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#discussion_r240431621
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ##
 @@ -118,10 +115,12 @@ case class HashClusteredDistribution(
 
 /**
  * Represents data where tuples have been ordered according to the `ordering`
- * [[Expression Expressions]].  This is a strictly stronger guarantee than
- * [[ClusteredDistribution]] as an ordering will ensure that tuples that share 
the
- * same value for the ordering expressions are contiguous and will never be 
split across
- * partitions.
+ * [[Expression Expressions]]. Its requirement is defined as the following:
+ *   - Given any 2 adjacent partitions, all the rows of the second partition 
must be larger than or
+ * equal to any row in the first partition, according to the `ordering` 
expressions.
 
 Review comment:
   @hvanhovell We need this relaxed requirement, otherwise we have to remove 
the optimization 
[here](https://github.com/apache/spark/pull/23249/files#diff-14e5db1a50efdaf9c760aeb5ced4R272)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when encoding a decimal value

2018-12-10 Thread GitBox

cloud-fan commented on issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] 
CheckOverflow when encoding a decimal value
URL: https://github.com/apache/spark/pull/23232#issuecomment-446036494
 
 
   Hi @mgaido91 , just one more question. Without this patch, does Spark always 
return wrong result if the actual decimal doesn't fix the precision? even for 
simple operations like `df.show`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance

2018-12-10 Thread GitBox

HyukjinKwon removed a comment on issue #23262: [SPARK-26312][SQL]Replace 
RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
URL: https://github.com/apache/spark/pull/23262#issuecomment-446030243
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HyukjinKwon commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance

2018-12-10 Thread GitBox

HyukjinKwon commented on issue #23262: [SPARK-26312][SQL]Replace 
RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
URL: https://github.com/apache/spark/pull/23262#issuecomment-446030243
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr

2018-12-10 Thread GitBox

HeartSaVioR commented on a change in pull request #23260: [SPARK-26311][YARN] 
New feature: custom log URL for stdout/stderr
URL: https://github.com/apache/spark/pull/23260#discussion_r240436930
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ##
 @@ -246,13 +246,56 @@ private[yarn] class ExecutorRunnable(
   sys.env.get("SPARK_USER").foreach { user =>
 val containerId = ConverterUtils.toString(c.getId)
 val address = c.getNodeHttpAddress
-val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
 
-env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
-env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+sparkConf.get(config.CUSTOM_LOG_URL) match {
+  case Some(customUrl) =>
+val pathVariables = 
ExecutorRunnable.buildPathVariables(httpScheme, address,
+  YarnConfiguration.getClusterId(conf), containerId, user)
+val envNameToFileNameMap = Map("SPARK_LOG_URL_STDERR" -> "stderr",
+  "SPARK_LOG_URL_STDOUT" -> "stdout")
+val logUrls = ExecutorRunnable.replaceLogUrls(customUrl, 
pathVariables,
+  envNameToFileNameMap)
+
+logUrls.foreach { case (envName, url) =>
+  env(envName) = url
+}
+  case None =>
+val baseUrl = 
s"$httpScheme$address/node/containerlogs/$containerId/$user"
+env("SPARK_LOG_URL_STDERR") = s"$baseUrl/stderr?start=-4096"
+env("SPARK_LOG_URL_STDOUT") = s"$baseUrl/stdout?start=-4096"
+  }
   }
 }
 
 env
   }
 }
+
+private[yarn] object ExecutorRunnable {
+  val LOG_URL_PATTERN_HTTP_SCHEME = "{{HttpScheme}}"
+  val LOG_URL_PATTERN_NODE_HTTP_ADDRESS = "{{NodeHttpAddress}}"
+  val LOG_URL_PATTERN_CLUSTER_ID = "{{ClusterId}}"
+  val LOG_URL_PATTERN_CONTAINER_ID = "{{ContainerId}}"
+  val LOG_URL_PATTERN_USER = "{{User}}"
+  val LOG_URL_PATTERN_FILE_NAME = "{{FileName}}"
+
+  def buildPathVariables(httpScheme: String, nodeHttpAddress: String, 
clusterId: String,
 
 Review comment:
   I guess it's allowed when it fits within two-lines, but no problem to change 
it. Will address.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] devaraj-kavali commented on issue #19616: [SPARK-22404][YARN] Provide an option to use unmanaged AM in yarn-client mode

2018-12-10 Thread GitBox

devaraj-kavali commented on issue #19616: [SPARK-22404][YARN] Provide an option 
to use unmanaged AM in yarn-client mode
URL: https://github.com/apache/spark/pull/19616#issuecomment-446029794
 
 
   Thanks @vanzin for taking time to look into this, will update it with the 
changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23277: [SPARK-26327][SQL] Metrics in 
FileSourceScanExec not update correctly
URL: https://github.com/apache/spark/pull/23277#issuecomment-446028069
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5945/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in 
FileSourceScanExec not update correctly
URL: https://github.com/apache/spark/pull/23277#issuecomment-446028068
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly

2018-12-10 Thread GitBox

SparkQA commented on issue #23277: [SPARK-26327][SQL] Metrics in 
FileSourceScanExec not update correctly
URL: https://github.com/apache/spark/pull/23277#issuecomment-446028080
 
 
   **[Test build #99940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99940/testReport)**
 for PR 23277 at commit 
[`0e00aa7`](https://github.com/apache/spark/commit/0e00aa7a219805f3d14ca4d222df4a922a34d825).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in 
FileSourceScanExec not update correctly
URL: https://github.com/apache/spark/pull/23277#issuecomment-446028069
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5945/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #23277: [SPARK-26327][SQL] Metrics in 
FileSourceScanExec not update correctly
URL: https://github.com/apache/spark/pull/23277#issuecomment-446028068
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] srowen closed pull request #23168: [SPARK-26207][doc]add PowerIterationClustering (PIC) doc in 2.4 branch

2018-12-10 Thread GitBox

srowen closed pull request #23168: [SPARK-26207][doc]add 
PowerIterationClustering (PIC) doc in 2.4 branch
URL: https://github.com/apache/spark/pull/23168
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
index 1186fb73d0faf..d345512d2b8e8 100644
--- a/docs/ml-clustering.md
+++ b/docs/ml-clustering.md
@@ -265,3 +265,38 @@ Refer to the [R API 
docs](api/R/spark.gaussianMixture.html) for more details.
 
 
 
+
+## Power Iteration Clustering (PIC)
+
+Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
+developed by http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and 
Cohen.
+From the abstract: PIC finds a very low-dimensional embedding of a dataset
+using truncated power iteration on a normalized pair-wise similarity matrix of 
the data.
+
+`spark.ml`'s PowerIterationClustering implementation takes the following 
parameters:
+
+* `k`: the number of clusters to create
+* `initMode`: param for the initialization algorithm
+* `maxIter`: param for maximum number of iterations
+* `srcCol`: param for the name of the input column for source vertex IDs
+* `dstCol`: name of the input column for destination vertex IDs
+* `weightCol`: Param for weight column name
+
+**Examples**
+
+
+
+
+Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.PowerIterationClustering)
 for more details.
+
+{% include_example 
scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala %}
+
+
+
+Refer to the [Java API 
docs](api/java/org/apache/spark/ml/clustering/PowerIterationClustering.html) 
for more details.
+
+{% include_example 
java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java %}
+
+
+
+


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] srowen commented on issue #23168: [SPARK-26207][doc]add PowerIterationClustering (PIC) doc in 2.4 branch

2018-12-10 Thread GitBox

srowen commented on issue #23168: [SPARK-26207][doc]add 
PowerIterationClustering (PIC) doc in 2.4 branch
URL: https://github.com/apache/spark/pull/23168#issuecomment-446027874
 
 
   Merged to 2.4


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)

2018-12-10 Thread GitBox

icexelloss commented on a change in pull request #22305: 
[SPARK-24561][SQL][Python] User-defined window aggregation functions with 
Pandas UDF (bounded window)
URL: https://github.com/apache/spark/pull/22305#discussion_r240434370
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
 ##
 @@ -144,24 +282,107 @@ case class WindowInPandasExec(
 queue.close()
   }
 
-  val inputProj = UnsafeProjection.create(allInputs, child.output)
-  val pythonInput = grouped.map { case (_, rows) =>
-rows.map { row =>
-  queue.add(row.asInstanceOf[UnsafeRow])
-  inputProj(row)
+  val stream = iter.map { row =>
+queue.add(row.asInstanceOf[UnsafeRow])
+row
+  }
+
+  val pythonInput = new Iterator[Iterator[UnsafeRow]] {
+
+// Manage the stream and the grouping.
+var nextRow: UnsafeRow = null
+var nextGroup: UnsafeRow = null
+var nextRowAvailable: Boolean = false
+private[this] def fetchNextRow() {
+  nextRowAvailable = stream.hasNext
+  if (nextRowAvailable) {
+nextRow = stream.next().asInstanceOf[UnsafeRow]
+nextGroup = grouping(nextRow)
+  } else {
+nextRow = null
+nextGroup = null
+  }
+}
+fetchNextRow()
+
+// Manage the current partition.
+val buffer: ExternalAppendOnlyUnsafeRowArray =
+  new ExternalAppendOnlyUnsafeRowArray(inMemoryThreshold, 
spillThreshold)
+var bufferIterator: Iterator[UnsafeRow] = _
+
+val indexRow = new 
SpecificInternalRow(Array.fill(numBoundIndices)(IntegerType))
+
+val frames = factories.map(_(indexRow))
+
+private[this] def fetchNextPartition() {
+  // Collect all the rows in the current partition.
+  // Before we start to fetch new input rows, make a copy of nextGroup.
+  val currentGroup = nextGroup.copy()
+
+  // clear last partition
+  buffer.clear()
+
+  while (nextRowAvailable && nextGroup == currentGroup) {
 
 Review comment:
   @hvanhovell Thanks for chiming in. I am not very familiar with unsafe row 
comparison and was just following @ueshin suggestion. If this is not needed. I 
can close #23279 and leave it as is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] srowen closed pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-10 Thread GitBox

srowen closed pull request #23072: [SPARK-19827][R]spark.ml R API for PIC
URL: https://github.com/apache/spark/pull/23072
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index de56061b4c1c7..afb3b542b6c6d 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -67,7 +67,8 @@ exportMethods("glm",
   "spark.fpGrowth",
   "spark.freqItemsets",
   "spark.associationRules",
-  "spark.findFrequentSequentialPatterns")
+  "spark.findFrequentSequentialPatterns",
+  "spark.assignClusters")
 
 # Job group lifecycle management methods
 export("setJobGroup",
diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R
index cbed276274ac1..c2a6acef8ea48 100644
--- a/R/pkg/R/generics.R
+++ b/R/pkg/R/generics.R
@@ -1468,6 +1468,10 @@ setGeneric("spark.associationRules", function(object) { 
standardGeneric("spark.a
 setGeneric("spark.findFrequentSequentialPatterns",
 function(data, ...) { 
standardGeneric("spark.findFrequentSequentialPatterns") })
 
+#' @rdname spark.powerIterationClustering
+setGeneric("spark.assignClusters",
+function(data, ...) { standardGeneric("spark.assignClusters") })
+
 #' @param object a fitted ML model object.
 #' @param path the directory where the model is saved.
 #' @param ... additional argument(s) passed to the method.
diff --git a/R/pkg/R/mllib_clustering.R b/R/pkg/R/mllib_clustering.R
index 900be685824da..7d9dcebfe70d3 100644
--- a/R/pkg/R/mllib_clustering.R
+++ b/R/pkg/R/mllib_clustering.R
@@ -41,6 +41,12 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' S4 class that represents a PowerIterationClustering
+#'
+#' @param jobj a Java object reference to the backing Scala 
PowerIterationClustering
+#' @note PowerIterationClustering since 3.0.0
+setClass("PowerIterationClustering", slots = list(jobj = "jobj"))
+
 #' Bisecting K-Means Clustering Model
 #'
 #' Fits a bisecting k-means clustering model against a SparkDataFrame.
@@ -610,3 +616,59 @@ setMethod("write.ml", signature(object = "LDAModel", path 
= "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data a SparkDataFrame.
+#' @param k the number of clusters to create.
+#' @param initMode the initialization algorithm.
+#' @param maxIter the maximum number of iterations.
+#' @param sourceCol the name of the input column for source vertex IDs.
+#' @param destinationCol the name of the input column for destination vertex 
IDs
+#' @param weightCol weight column name. If this is not set or \code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the corresponding 
cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases 
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+#'list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+#'list(4L, 0L, 0.1)),
+#'   schema = c("src", "dst", "weight"))
+#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
+#' showDF(clusters)
+#' }
+#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
+setMethod("spark.assignClusters",
+  signature(data = "SparkDataFrame"),
+  function(data, k = 2L, initMode = c("random", "degree"), maxIter = 
20L,
+sourceCol = "src", destinationCol = "dst", weightCol = NULL) {
+if (!is.numeric(k) || k < 1) {
+  stop("k should be a number with value >= 1.")
+}
+if (!is.integer(maxIter) || maxIter <= 0) {
+  stop("maxIter should be a number with value > 0.")
+}
+initMode <- match.arg(initMode)
+if (!is.null(weightCol) && weightCol == "") {
+  weightCol <- NULL
+} else if (!is.null(weightCol)) {
+  weightCol <- as.character(weightCol)
+}
+jobj <- 
callJStatic("org.apache.spark.ml.r.PowerIterationClusteringWrapper",
+"get

[GitHub] srowen closed pull request #22683: [SPARK-25696] The storage memory displayed on spark Application UI is…

2018-12-10 Thread GitBox

srowen closed pull request #22683: [SPARK-25696] The storage memory displayed 
on spark Application UI is…
URL: https://github.com/apache/spark/pull/22683
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/R/pkg/R/context.R b/R/pkg/R/context.R
index e99136723f65b..0207f249f9aa0 100644
--- a/R/pkg/R/context.R
+++ b/R/pkg/R/context.R
@@ -87,7 +87,7 @@ objectFile <- function(sc, path, minPartitions = NULL) {
 #' in the list are split into \code{numSlices} slices and distributed to nodes
 #' in the cluster.
 #'
-#' If size of serialized slices is larger than spark.r.maxAllocationLimit or 
(200MB), the function
+#' If size of serialized slices is larger than spark.r.maxAllocationLimit or 
(200MiB), the function
 #' will write it to disk and send the file name to JVM. Also to make sure each 
slice is not
 #' larger than that limit, number of slices may be increased.
 #'
diff --git a/R/pkg/R/mllib_tree.R b/R/pkg/R/mllib_tree.R
index 0e60842dd44c8..9844061cfd074 100644
--- a/R/pkg/R/mllib_tree.R
+++ b/R/pkg/R/mllib_tree.R
@@ -157,7 +157,7 @@ print.summary.decisionTree <- function(x) {
 #' @param checkpointInterval Param for set checkpoint interval (>= 1) or 
disable checkpoint (-1).
 #'   Note: this setting will be ignored if the 
checkpoint directory is not
 #'   set.
-#' @param maxMemoryInMB Maximum memory in MB allocated to histogram 
aggregation.
+#' @param maxMemoryInMB Maximum memory in MiB allocated to histogram 
aggregation.
 #' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to 
match instances with
 #' nodes. If TRUE, the algorithm will cache node IDs for 
each instance. Caching
 #' can speed up training of deeper trees. Users can set 
how often should the
@@ -382,7 +382,7 @@ setMethod("write.ml", signature(object = 
"GBTClassificationModel", path = "chara
 #' @param checkpointInterval Param for set checkpoint interval (>= 1) or 
disable checkpoint (-1).
 #'   Note: this setting will be ignored if the 
checkpoint directory is not
 #'   set.
-#' @param maxMemoryInMB Maximum memory in MB allocated to histogram 
aggregation.
+#' @param maxMemoryInMB Maximum memory in MiB allocated to histogram 
aggregation.
 #' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to 
match instances with
 #' nodes. If TRUE, the algorithm will cache node IDs for 
each instance. Caching
 #' can speed up training of deeper trees. Users can set 
how often should the
@@ -588,7 +588,7 @@ setMethod("write.ml", signature(object = 
"RandomForestClassificationModel", path
 #' @param checkpointInterval Param for set checkpoint interval (>= 1) or 
disable checkpoint (-1).
 #'   Note: this setting will be ignored if the 
checkpoint directory is not
 #'   set.
-#' @param maxMemoryInMB Maximum memory in MB allocated to histogram 
aggregation.
+#' @param maxMemoryInMB Maximum memory in MiB allocated to histogram 
aggregation.
 #' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to 
match instances with
 #' nodes. If TRUE, the algorithm will cache node IDs for 
each instance. Caching
 #' can speed up training of deeper trees. Users can set 
how often should the
diff --git a/core/src/main/resources/org/apache/spark/ui/static/utils.js 
b/core/src/main/resources/org/apache/spark/ui/static/utils.js
index deeafad4eb5f5..22985e31a7808 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/utils.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/utils.js
@@ -40,9 +40,9 @@ function formatDuration(milliseconds) {
 function formatBytes(bytes, type) {
 if (type !== 'display') return bytes;
 if (bytes == 0) return '0.0 B';
-var k = 1000;
+var k = 1024;
 var dm = 1;
-var sizes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'];
+var sizes = ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB'];
 var i = Math.floor(Math.log(bytes) / Math.log(k));
 return parseFloat((bytes / Math.pow(k, i)).toFixed(dm)) + ' ' + sizes[i];
 }
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 845a3d5f6d6f9..696dafda6d1ec 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -1043,7 +1043,7 @@ class SparkContext(config: SparkConf) extends Logging {
 // See SPARK-11227 for details.
 FileSystem.getLocal(hadoopConfiguration)
 
-// A Hadoop configuration can be about 10 KB, which

[GitHub] cloud-fan commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly

2018-12-10 Thread GitBox

cloud-fan commented on issue #23277: [SPARK-26327][SQL] Metrics in 
FileSourceScanExec not update correctly
URL: https://github.com/apache/spark/pull/23277#issuecomment-446027024
 
 
   good catch!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly

2018-12-10 Thread GitBox

cloud-fan commented on issue #23277: [SPARK-26327][SQL] Metrics in 
FileSourceScanExec not update correctly
URL: https://github.com/apache/spark/pull/23277#issuecomment-446027053
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] srowen commented on issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-10 Thread GitBox

srowen commented on issue #23072: [SPARK-19827][R]spark.ml R API for PIC
URL: https://github.com/apache/spark/pull/23072#issuecomment-446026964
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] srowen commented on issue #22683: [SPARK-25696] The storage memory displayed on spark Application UI is…

2018-12-10 Thread GitBox

srowen commented on issue #22683: [SPARK-25696] The storage memory displayed on 
spark Application UI is…
URL: https://github.com/apache/spark/pull/22683#issuecomment-446026600
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23249: [SPARK-26297][SQL] 
improve the doc of Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#discussion_r240431799
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ##
 @@ -118,10 +115,12 @@ case class HashClusteredDistribution(
 
 /**
  * Represents data where tuples have been ordered according to the `ordering`
- * [[Expression Expressions]].  This is a strictly stronger guarantee than
- * [[ClusteredDistribution]] as an ordering will ensure that tuples that share 
the
- * same value for the ordering expressions are contiguous and will never be 
split across
- * partitions.
+ * [[Expression Expressions]]. Its requirement is defined as the following:
+ *   - Given any 2 adjacent partitions, all the rows of the second partition 
must be larger than or
+ * equal to any row in the first partition, according to the `ordering` 
expressions.
 
 Review comment:
   I did not change the semantic, I just correct the comment to represent what 
the current semantic is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] cloud-fan commented on a change in pull request #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning

2018-12-10 Thread GitBox

cloud-fan commented on a change in pull request #23249: [SPARK-26297][SQL] 
improve the doc of Distribution/Partitioning
URL: https://github.com/apache/spark/pull/23249#discussion_r240431621
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ##
 @@ -118,10 +115,12 @@ case class HashClusteredDistribution(
 
 /**
  * Represents data where tuples have been ordered according to the `ordering`
- * [[Expression Expressions]].  This is a strictly stronger guarantee than
- * [[ClusteredDistribution]] as an ordering will ensure that tuples that share 
the
- * same value for the ordering expressions are contiguous and will never be 
split across
- * partitions.
+ * [[Expression Expressions]]. Its requirement is defined as the following:
+ *   - Given any 2 adjacent partitions, all the rows of the second partition 
must be larger than or
+ * equal to any row in the first partition, according to the `ordering` 
expressions.
 
 Review comment:
   @hvanhovell We need this relaxed requirement, otherwise we have to remove 
the optimization 
[here](https://github.com/apache/spark/pull/23249/files#diff-14e5db1a50efdaf9c760aeb5ced4R273)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keytab-based token renewal.

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #22911: [SPARK-25815][k8s] Support 
kerberos in client mode, keytab-based token renewal.
URL: https://github.com/apache/spark/pull/22911#issuecomment-446022247
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keytab-based token renewal.

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #22911: [SPARK-25815][k8s] Support 
kerberos in client mode, keytab-based token renewal.
URL: https://github.com/apache/spark/pull/22911#issuecomment-446022248
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99932/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keytab-based token renewal.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #22911: [SPARK-25815][k8s] Support kerberos in 
client mode, keytab-based token renewal.
URL: https://github.com/apache/spark/pull/22911#issuecomment-446022247
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keytab-based token renewal.

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #22911: [SPARK-25815][k8s] Support kerberos in 
client mode, keytab-based token renewal.
URL: https://github.com/apache/spark/pull/22911#issuecomment-446022248
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99932/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] vanzin commented on issue #23220: [SPARK-25877][k8s] Move all feature logic to feature classes.

2018-12-10 Thread GitBox

vanzin commented on issue #23220: [SPARK-25877][k8s] Move all feature logic to 
feature classes.
URL: https://github.com/apache/spark/pull/23220#issuecomment-446022089
 
 
   So, anybody interested in reviewing this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA removed a comment on issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keytab-based token renewal.

2018-12-10 Thread GitBox

SparkQA removed a comment on issue #22911: [SPARK-25815][k8s] Support kerberos 
in client mode, keytab-based token renewal.
URL: https://github.com/apache/spark/pull/22911#issuecomment-445939970
 
 
   **[Test build #99932 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99932/testReport)**
 for PR 22911 at commit 
[`4143603`](https://github.com/apache/spark/commit/414360315a484d7a32e6817ff2197cd1f3f8f43c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #22279: [SPARK-25277][YARN] YARN 
applicationMaster metrics should not register static metrics
URL: https://github.com/apache/spark/pull/22279#issuecomment-446021735
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5944/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #22279: [SPARK-25277][YARN] YARN 
applicationMaster metrics should not register static metrics
URL: https://github.com/apache/spark/pull/22279#issuecomment-446021727
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #22911: [SPARK-25815][k8s] Support kerberos in client mode, keytab-based token renewal.

2018-12-10 Thread GitBox

SparkQA commented on issue #22911: [SPARK-25815][k8s] Support kerberos in 
client mode, keytab-based token renewal.
URL: https://github.com/apache/spark/pull/22911#issuecomment-446021887
 
 
   **[Test build #99932 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99932/testReport)**
 for PR 22911 at commit 
[`4143603`](https://github.com/apache/spark/commit/414360315a484d7a32e6817ff2197cd1f3f8f43c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class ArrowCollectSerializer(Serializer):`
 * `class CSVInferSchema(val options: CSVOptions) extends Serializable `
 * `class InterpretedSafeProjection(expressions: Seq[Expression]) extends 
Projection `
 * `sealed trait DateTimeFormatter `
 * `class Iso8601DateTimeFormatter(`
 * `class LegacyDateTimeFormatter(`
 * `class LegacyFallbackDateTimeFormatter(`
 * `sealed trait DateFormatter `
 * `class Iso8601DateFormatter(`
 * `class LegacyDateFormatter(`
 * `class LegacyFallbackDateFormatter(`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #22279: [SPARK-25277][YARN] YARN 
applicationMaster metrics should not register static metrics
URL: https://github.com/apache/spark/pull/22279#issuecomment-446021727
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins commented on issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics

2018-12-10 Thread GitBox

AmplabJenkins commented on issue #22279: [SPARK-25277][YARN] YARN 
applicationMaster metrics should not register static metrics
URL: https://github.com/apache/spark/pull/22279#issuecomment-446021735
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5944/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] SparkQA commented on issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics

2018-12-10 Thread GitBox

SparkQA commented on issue #22279: [SPARK-25277][YARN] YARN applicationMaster 
metrics should not register static metrics
URL: https://github.com/apache/spark/pull/22279#issuecomment-446021730
 
 
   **[Test build #99939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99939/testReport)**
 for PR 22279 at commit 
[`a990758`](https://github.com/apache/spark/commit/a99075873fa1519fe07344eb90d5c8af02b9d7e5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] vanzin closed pull request #21279: [SPARK-24219][k8s] Improve the docker building script to avoid copying everything under examples to docker image

2018-12-10 Thread GitBox

vanzin closed pull request #21279: [SPARK-24219][k8s] Improve the docker 
building script to avoid copying everything under examples to docker image
URL: https://github.com/apache/spark/pull/21279
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/bin/docker-image-tool.sh b/bin/docker-image-tool.sh
index f090240065bf1..7ded77426340b 100755
--- a/bin/docker-image-tool.sh
+++ b/bin/docker-image-tool.sh
@@ -44,15 +44,37 @@ function image_ref {
 function build {
   local BUILD_ARGS
   local IMG_PATH
+  local TMPFOLDER
 
   if [ ! -f "$SPARK_HOME/RELEASE" ]; then
 # Set image build arguments accordingly if this is a source repo and not a 
distribution archive.
+local 
JARS="${SPARK_HOME}/assembly/target/scala-${SPARK_SCALA_VERSION}/jars"
+TMPFOLDER=`mktemp -q -d examples.XX`
+if [ $? -ne 0 ]; then
+  ehco "Cannot create temp folder, exiting..."
+  exit 1
+fi
+
+mkdir -p "${TMPFOLDER}/jars"
+cp "${SPARK_HOME}"/examples/target/scala*/jars/* "${TMPFOLDER}/jars"
+for f in "${TMPFOLDER}"/jars/*; do
+  name=$(basename "$f")
+  if [ -f "${JARS}/${name}" ]; then
+rm "${TMPFOLDER}/jars/${name}"
+  fi
+done
+
+mkdir -p "${TMPFOLDER}/src/main"
+cp -r "${SPARK_HOME}/examples/src/main" "${TMPFOLDER}/src"
+
 IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
 BUILD_ARGS=(
   --build-arg
   img_path=$IMG_PATH
   --build-arg
   spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
+  --build-arg
+  spark_examples=$TMPFOLDER
 )
   else
 # Not passed as an argument to docker, but used to validate the Spark 
directory.
@@ -69,6 +91,10 @@ function build {
   docker build "${BUILD_ARGS[@]}" \
 -t $(image_ref spark) \
 -f "$DOCKERFILE" .
+
+  if [ -d "${TMPFOLDER}" ]; then
+rm -fr "${TMPFOLDER}"
+  fi
 }
 
 function push {
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
index 9badf8556afc3..198f14f2955d1 100644
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
+++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
@@ -19,6 +19,7 @@ FROM openjdk:8-alpine
 
 ARG spark_jars=jars
 ARG img_path=kubernetes/dockerfiles
+ARG spark_examples=examples
 
 # Before building the docker image, first build and make a Spark distribution 
following
 # the instructions in http://spark.apache.org/docs/latest/building-spark.html.
@@ -41,7 +42,7 @@ COPY ${spark_jars} /opt/spark/jars
 COPY bin /opt/spark/bin
 COPY sbin /opt/spark/sbin
 COPY ${img_path}/spark/entrypoint.sh /opt/
-COPY examples /opt/spark/examples
+COPY ${spark_examples} /opt/spark/examples
 COPY data /opt/spark/data
 
 ENV SPARK_HOME /opt/spark


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] vanzin commented on issue #21279: [SPARK-24219][k8s] Improve the docker building script to avoid copying everything under examples to docker image

2018-12-10 Thread GitBox

vanzin commented on issue #21279: [SPARK-24219][k8s] Improve the docker 
building script to avoid copying everything under examples to docker image
URL: https://github.com/apache/spark/pull/21279#issuecomment-446021270
 
 
   This was fixed in SPARK-26025.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] AmplabJenkins removed a comment on issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics

2018-12-10 Thread GitBox

AmplabJenkins removed a comment on issue #22279: [SPARK-25277][YARN] YARN 
applicationMaster metrics should not register static metrics
URL: https://github.com/apache/spark/pull/22279#issuecomment-417219160
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] vanzin commented on issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics should not register static metrics

2018-12-10 Thread GitBox

vanzin commented on issue #22279: [SPARK-25277][YARN] YARN applicationMaster 
metrics should not register static metrics
URL: https://github.com/apache/spark/pull/22279#issuecomment-446020969
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] aokolnychyi commented on issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - foundation

2018-12-10 Thread GitBox

aokolnychyi commented on issue #21320: [SPARK-4502][SQL] Parquet nested column 
pruning - foundation
URL: https://github.com/apache/spark/pull/21320#issuecomment-446020655
 
 
   @mallman @dbtsai @gatorsmile 
   
   One question on non-deterministic expressions. For example, let's consider a 
non-deterministic UDF.
   
   ```
   val nonDeterministicUdf = udf((first: String) => first + " " + 
Math.random()).asNondeterministic()
   val query = data.select(col("id"), nonDeterministicUdf(col("name.first")))
   ```
   
   As it is today, there will be no schema pruning due to the way how 
`collectProjectsAndFilters` is defined in `PhysicalOperation`.
   
   ```
   == Analyzed Logical Plan ==
   id: int, UDF(name.first): string
   Project [id#222, UDF(name#223.first) AS UDF(name.first)#246]
   +- Project [id#222, name#223, address#224, pets#225, friends#226, 
relatives#227, employer#228, p#229]
  +- SubqueryAlias `contacts`
 +- 
Relation[id#222,name#223,address#224,pets#225,friends#226,relatives#227,employer#228,p#229]
 parquet
   
   == Optimized Logical Plan ==
   Project [id#222, UDF(name#223.first) AS UDF(name.first)#246]
   +- 
Relation[id#222,name#223,address#224,pets#225,friends#226,relatives#227,employer#228,p#229]
 parquet
   
   == Physical Plan ==
   *(1) Project [id#222, UDF(name#223.first) AS UDF(name.first)#246]
   +- *(1) FileScan parquet 
[id#222,name#223,address#224,pets#225,friends#226,relatives#227,employer#228,p#229]
 Batched: false, DataFilters: [], Format: Parquet, Location: 
InMemoryFileIndex[file:/private/var/folders/f3/6jyczfzd15ndvh49zq0d_sg8gn/T/spark-6b69e4e9-c6...,
 PartitionCount: 2, PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct,address:string,pets:int,friends...
   ```
   
   To me, it seems valid to apply schema prunining in this case. What do you 
think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] vanzin commented on issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on executor side

2018-12-10 Thread GitBox

vanzin commented on issue #19840: [SPARK-22640][PYSPARK][YARN]switch python 
exec on executor side
URL: https://github.com/apache/spark/pull/19840#issuecomment-446020608
 
 
   @yaooqinn we should probably close this if you're not planning to look at 
the root of the problem, which seems to be on the python side.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] vanzin commented on a change in pull request #19616: [SPARK-22404][YARN] Provide an option to use unmanaged AM in yarn-client mode

2018-12-10 Thread GitBox

vanzin commented on a change in pull request #19616: [SPARK-22404][YARN] 
Provide an option to use unmanaged AM in yarn-client mode
URL: https://github.com/apache/spark/pull/19616#discussion_r240421106
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 ##
 @@ -1084,14 +1095,38 @@ private[spark] class Client(
   if (returnOnRunning && state == YarnApplicationState.RUNNING) {
 return createAppReport(report)
   }
-
+  if (state == YarnApplicationState.ACCEPTED && isClientUnmanagedAMEnabled
+&& !amServiceStarted && report.getAMRMToken != null) {
+amServiceStarted = true
+startApplicationMasterService(report)
+  }
   lastState = state
 }
 
 // Never reached, but keeps compiler happy
 throw new SparkException("While loop is depleted! This should never 
happen...")
   }
 
+  private def startApplicationMasterService(report: ApplicationReport) = {
+// Add AMRMToken to establish connection between RM and AM
+val token = report.getAMRMToken
+val amRMToken: org.apache.hadoop.security.token.Token[AMRMTokenIdentifier] 
=
+  new org.apache.hadoop.security.token.Token[AMRMTokenIdentifier](token
+.getIdentifier().array(), token.getPassword().array, new Text(
+token.getKind()), new Text(token.getService()))
+val currentUGI = UserGroupInformation.getCurrentUser
+currentUGI.addToken(amRMToken)
+
+sparkConf.set("spark.yarn.containerId",
+  ContainerId.newContainerId(report.getCurrentApplicationAttemptId, 
1).toString)
+// Start Application Service in a separate thread and continue with 
application monitoring
+val amService = new Thread() {
 
 Review comment:
   Thread name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 >

201 - 300 of 826 matches

Mail list logo