date:20181008

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97140/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22674
  
**[Test build #97140 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97140/testReport)**
 for PR 22674 at commit 
[`a456226`](https://github.com/apache/spark/commit/a4562264c45c24a88f4d508c2d34d4e7aed50631).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22656: [SPARK-25669][SQL] Check CSV header only when it ...

2018-10-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22656#discussion_r223566522
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -505,7 +505,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
 val actualSchema =
   StructType(schema.filterNot(_.name == 
parsedOptions.columnNameOfCorruptRecord))
 
-val linesWithoutHeader: RDD[String] = maybeFirstLine.map { firstLine =>
+val linesWithoutHeader = if (parsedOptions.headerFlag && 
maybeFirstLine.isDefined) {
--- End diff --

LGTM but it really needs some refactoring. Let me give a shot


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223566032
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

Usually it depends on how important the use case is. For example, CSV was 
created as an external data source and later merged into Spark. See 
https://issues.apache.org/jira/browse/SPARK-21866?focusedCommentId=16148268=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16148268.
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22623
  
**[Test build #97143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97143/testReport)**
 for PR 22623 at commit 
[`131b3af`](https://github.com/apache/spark/commit/131b3af7ab286d2f001cbf0638cd5dbe7653a420).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...

2018-10-08 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22661
  
cc @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22663: [SPARK-25490][SQL][TEST] Refactor KryoBenchmark to use m...

2018-10-08 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22663
  
@dongjoon-hyun Is the changes OK to you?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223560424
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -47,6 +50,13 @@ private[spark] case class KubernetesExecutorSpecificConf(
 driverPod: Option[Pod])
   extends KubernetesRoleSpecificConf
 
+/*
+ * Structure containing metadata for HADOOP_CONF_DIR customization
+ */
+private[spark] case class HadoopConfSpecConf(
--- End diff --

Let's just call it `HadoopConfSpec`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223560209
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+
+private[spark] object HadoopBootstrapUtil {
+
+   /**
+* Mounting the DT secret for both the Driver and the executors
+*
+* @param dtSecretName Name of the secret that stores the Delegation 
Token
+* @param dtSecretItemKey Name of the Item Key storing the Delegation 
Token
+* @param userName Name of the SparkUser to set SPARK_USER
+* @param maybeFileLocation Optional Location of the krb5 file
+* @param newKrb5ConfName Optional location of the ConfigMap for Krb5
+* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapKerberosPod(
+dtSecretName: String,
+dtSecretItemKey: String,
+userName: String,
+maybeFileLocation: Option[String],
+newKrb5ConfName: Option[String],
+maybeKrb5ConfName: Option[String],
--- End diff --

Rename this to `existingKrb5ConfName`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223557398
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -61,7 +71,16 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
 roleSecretEnvNamesToKeyRefs: Map[String, String],
 roleEnvs: Map[String, String],
 roleVolumes: Iterable[KubernetesVolumeSpec[_ <: 
KubernetesVolumeSpecificConf]],
-sparkFiles: Seq[String]) {
+sparkFiles: Seq[String],
+hadoopConfDir: Option[HadoopConfSpecConf]) {
--- End diff --

Can you rename this parameter to `hadoopConfSpec`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223561140
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.HasMetadata
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, 
SparkPod}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils._
+import org.apache.spark.internal.Logging
+
+ /**
+  * Runs the necessary Hadoop-based logic based on Kerberos configs and 
the presence of the
+  * HADOOP_CONF_DIR. This runs various bootstrap methods defined in 
HadoopBootstrapUtil.
+  */
+private[spark] class KerberosConfDriverFeatureStep(
+   kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+   extends KubernetesFeatureConfigStep with Logging {
+
+   require(kubernetesConf.hadoopConfDir.isDefined,
+ "Ensure that HADOOP_CONF_DIR is defined either via env or a 
pre-defined ConfigMap")
+   private val hadoopConfDirSpec = kubernetesConf.hadoopConfDir.get
+   private val conf = kubernetesConf.sparkConf
+   private val maybePrincipal = 
conf.get(org.apache.spark.internal.config.PRINCIPAL)
+   private val maybeKeytab = 
conf.get(org.apache.spark.internal.config.KEYTAB)
+   private val maybeExistingSecretName = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME)
+   private val maybeExistingSecretItemKey =
+ conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY)
+   private val maybeKrb5File =
+  conf.get(KUBERNETES_KERBEROS_KRB5_FILE)
+   private val maybeKrb5CMap =
--- End diff --

Can we call this `existingKrb5ConfigMap`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223560748
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -47,6 +50,13 @@ private[spark] case class KubernetesExecutorSpecificConf(
 driverPod: Option[Pod])
   extends KubernetesRoleSpecificConf
 
+/*
+ * Structure containing metadata for HADOOP_CONF_DIR customization
+ */
+private[spark] case class HadoopConfSpecConf(
--- End diff --

Can you create a similar one for krb5 and call it `Krb5ConfSpec`, which 
contains an optional krb5 file location or the name of a pre-defined krb5 
ConfigMap? Then you can use that `Krb5ConfSpec` in places where you use both 
the file location and ConfigMap as parameters. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223558019
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopBootstrapUtil.scala
 ---
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkPod
+
+private[spark] object HadoopBootstrapUtil {
+
+   /**
+* Mounting the DT secret for both the Driver and the executors
+*
+* @param dtSecretName Name of the secret that stores the Delegation 
Token
+* @param dtSecretItemKey Name of the Item Key storing the Delegation 
Token
+* @param userName Name of the SparkUser to set SPARK_USER
+* @param maybeFileLocation Optional Location of the krb5 file
+* @param newKrb5ConfName Optional location of the ConfigMap for Krb5
+* @param maybeKrb5ConfName Optional name of ConfigMap for Krb5
+* @param pod Input pod to be appended to
+* @return a modified SparkPod
+*/
+  def bootstrapKerberosPod(
+dtSecretName: String,
+dtSecretItemKey: String,
+userName: String,
+maybeFileLocation: Option[String],
+newKrb5ConfName: Option[String],
+maybeKrb5ConfName: Option[String],
+pod: SparkPod) : SparkPod = {
+
+val maybePreConfigMapVolume = maybeKrb5ConfName.map { kconf =>
+  new VolumeBuilder()
+.withName(KRB_FILE_VOLUME)
+.withNewConfigMap()
+  .withName(kconf)
+  .endConfigMap()
+.build() }
+
+val maybeCreateConfigMapVolume = for {
+  fileLocation <- maybeFileLocation
+  krb5ConfName <- newKrb5ConfName
+ } yield {
+  val krb5File = new File(fileLocation)
+  val fileStringPath = krb5File.toPath.getFileName.toString
+  new VolumeBuilder()
+.withName(KRB_FILE_VOLUME)
+.withNewConfigMap()
+.withName(krb5ConfName)
+.withItems(new KeyToPathBuilder()
+  .withKey(fileStringPath)
+  .withPath(fileStringPath)
+  .build())
+.endConfigMap()
+.build()
+}
+
+ // Breaking up Volume Creation for clarity
+ val configMapVolume = maybePreConfigMapVolume.getOrElse(
+   maybeCreateConfigMapVolume.get)
+
+ val kerberizedPod = new PodBuilder(pod.pod)
+   .editOrNewSpec()
+.addNewVolume()
+  .withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+  .withNewSecret()
+.withSecretName(dtSecretName)
+.endSecret()
+  .endVolume()
+.addNewVolumeLike(configMapVolume)
+  .endVolume()
+.endSpec()
+.build()
+
+ val kerberizedContainer = new ContainerBuilder(pod.container)
+  .addNewVolumeMount()
+.withName(SPARK_APP_HADOOP_SECRET_VOLUME_NAME)
+.withMountPath(SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR)
+.endVolumeMount()
+  .addNewVolumeMount()
+.withName(KRB_FILE_VOLUME)
+.withMountPath(KRB_FILE_DIR_PATH + "/krb5.conf")
+.withSubPath("krb5.conf")
+.endVolumeMount()
+  .addNewEnv()
+.withName(ENV_HADOOP_TOKEN_FILE_LOCATION)
+
.withValue(s"$SPARK_APP_HADOOP_CREDENTIALS_BASE_DIR/$dtSecretItemKey")
+.endEnv()
+  .addNewEnv()
+.withName(ENV_SPARK_USER)
+.withValue(userName)
+.endEnv()
+  .build()
+  SparkPod(kerberizedPod, kerberizedContainer)
+  }
+
+   /**
+* setting

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223558932
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
 ---
@@ -78,4 +80,29 @@ private[spark] object Constants {
   val KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc;
   val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver"
   val MEMORY_OVERHEAD_MIN_MIB = 384L
+
+  // Hadoop Configuration
+  val HADOOP_FILE_VOLUME = "hadoop-properties"
+  val KRB_FILE_VOLUME = "krb5-file"
+  val HADOOP_CONF_DIR_PATH = "/opt/hadoop/conf"
+  val KRB_FILE_DIR_PATH = "/etc"
+  val ENV_HADOOP_CONF_DIR = "HADOOP_CONF_DIR"
+  val HADOOP_CONFIG_MAP_NAME =
+"spark.kubernetes.executor.hadoopConfigMapName"
+  val KRB5_CONFIG_MAP_NAME =
+"spark.kubernetes.executor.krb5ConfigMapName"
+
+  // Kerberos Configuration
+  val KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME = "delegation-tokens"
+  val KERBEROS_KEYTAB_SECRET_NAME =
--- End diff --

Looks like this is used as the name of the secret storing the DT, not the 
keytab. So please rename it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223561204
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.HasMetadata
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, 
SparkPod}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils._
+import org.apache.spark.internal.Logging
+
+ /**
+  * Runs the necessary Hadoop-based logic based on Kerberos configs and 
the presence of the
+  * HADOOP_CONF_DIR. This runs various bootstrap methods defined in 
HadoopBootstrapUtil.
+  */
+private[spark] class KerberosConfDriverFeatureStep(
+   kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+   extends KubernetesFeatureConfigStep with Logging {
+
+   require(kubernetesConf.hadoopConfDir.isDefined,
+ "Ensure that HADOOP_CONF_DIR is defined either via env or a 
pre-defined ConfigMap")
+   private val hadoopConfDirSpec = kubernetesConf.hadoopConfDir.get
+   private val conf = kubernetesConf.sparkConf
+   private val maybePrincipal = 
conf.get(org.apache.spark.internal.config.PRINCIPAL)
--- End diff --

Can we avoid adding the prefix `maybe` to the variable names? They make the 
variables unnecessarily longer without really improving readability much.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread liyinan926

Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223561016
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.HasMetadata
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, 
SparkPod}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils._
+import org.apache.spark.internal.Logging
+
+ /**
+  * Runs the necessary Hadoop-based logic based on Kerberos configs and 
the presence of the
+  * HADOOP_CONF_DIR. This runs various bootstrap methods defined in 
HadoopBootstrapUtil.
+  */
+private[spark] class KerberosConfDriverFeatureStep(
+   kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+   extends KubernetesFeatureConfigStep with Logging {
+
+   require(kubernetesConf.hadoopConfDir.isDefined,
+ "Ensure that HADOOP_CONF_DIR is defined either via env or a 
pre-defined ConfigMap")
+   private val hadoopConfDirSpec = kubernetesConf.hadoopConfDir.get
+   private val conf = kubernetesConf.sparkConf
+   private val maybePrincipal = 
conf.get(org.apache.spark.internal.config.PRINCIPAL)
+   private val maybeKeytab = 
conf.get(org.apache.spark.internal.config.KEYTAB)
+   private val maybeExistingSecretName = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME)
+   private val maybeExistingSecretItemKey =
+ conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY)
+   private val maybeKrb5File =
+  conf.get(KUBERNETES_KERBEROS_KRB5_FILE)
+   private val maybeKrb5CMap =
+ conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP)
+   private val kubeTokenManager = kubernetesConf.tokenManager(conf,
+ SparkHadoopUtil.get.newConfiguration(conf))
+   private val isKerberosEnabled =
+ (hadoopConfDirSpec.hadoopConfDir.isDefined && 
kubeTokenManager.isSecurityEnabled) ||
+   (hadoopConfDirSpec.hadoopConfigMapName.isDefined &&
+ (maybeKrb5File.isDefined || maybeKrb5CMap.isDefined))
+
+   require(maybeKeytab.isEmpty || isKerberosEnabled,
+ "You must enable Kerberos support if you are specifying a Kerberos 
Keytab")
+
+   require(maybeExistingSecretName.isEmpty || isKerberosEnabled,
+ "You must enable Kerberos support if you are specifying a Kerberos 
Secret")
+
+   require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || 
isKerberosEnabled,
+ "You must specify either a krb5 file location or a ConfigMap with a 
krb5 file")
+
+   KubernetesUtils.requireNandDefined(
+ maybeKrb5File,
+ maybeKrb5CMap,
+ "Do not specify both a Krb5 local file and the ConfigMap as the 
creation " +
+   "of an additional ConfigMap, when one is already specified, is 
extraneous")
+
+   KubernetesUtils.requireBothOrNeitherDefined(
+ maybeKeytab,
+ maybePrincipal,
+ "If a Kerberos principal is specified you must also specify a 
Kerberos keytab",
+ "If a Kerberos keytab is specified you must also specify a Kerberos 
principal")
+
+   KubernetesUtils.requireBothOrNeitherDefined(
+ maybeExistingSecretName,
+ maybeExistingSecretItemKey,
+ "If a secret data item-key where the data of the Kerberos Delegation 
Token is specified" +
+" you must also specify the name of the secret",
+ "If a secret storing a Kerberos Delegation Token is specified you 
must also" +
+" specify the item-key where the data is stored")
+
+   private val hadoopConfigurationFiles = 
hadoopConfDirSpec.hadoopConfDir.map { hConfDir =>
+

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22615
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22615
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97137/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22615
  
**[Test build #97137 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97137/testReport)**
 for PR 22615 at commit 
[`7392cf0`](https://github.com/apache/spark/commit/7392cf00c31d48790d1235330c2fe18f7f850624).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-08 Thread sandeep-katta

Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
@cloud-fan @gatorsmile if everything is okay,can you please merge this PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22583
  
**[Test build #97142 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97142/testReport)**
 for PR 22583 at commit 
[`0a27efd`](https://github.com/apache/spark/commit/0a27efdbcd5d7612cdeb9fc2618e1c76e70a586c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22583
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3812/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22583
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223552437
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

cc @mengxr as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223552386
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

Out of curiosity, why did we put the image source inside of Spark, rather 
then a separate module? (see also 
https://github.com/apache/spark/pull/21742#discussion_r201552008). Avro was put 
as a separate module.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22653: [SPARK-25659][PYTHON][TEST] Test type inference s...

2018-10-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22653#discussion_r223552056
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1149,6 +1149,75 @@ def test_infer_schema(self):
 result = self.spark.sql("SELECT l[0].a from test2 where d['key'].d 
= '2'")
 self.assertEqual(1, result.head()[0])
 
+def test_infer_schema_specification(self):
+from decimal import Decimal
+
+class A(object):
+def __init__(self):
+self.a = 1
+
+data = [
+True,
+1,
+"a",
+u"a",
--- End diff --

Fair point. Will change when I fix some codes around here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22615
  
I want to see the configurations .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22237#discussion_r223550790
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 # Migration Guide
 
+## Upgrading From Spark SQL 2.4 to 3.0
+
+  - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`. In version 2.4 and earlier, arrays of JSON objects 
are considered as invalid and converted to `null` if specified schema is 
`StructType`. Since Spark 3.0, the input is considered as a valid JSON array 
and only its first element is parsed if it conforms to the specified 
`StructType`.
--- End diff --

Last option sounds better to me but can we fill the corrupt row when the 
corrupt field name is specified in the schema? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97141/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22675
  
**[Test build #97141 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97141/testReport)**
 for PR 22675 at commit 
[`887f528`](https://github.com/apache/spark/commit/887f5282fba8a8a0bcbb9242eb87b27bf94d0210).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22630
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22630
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97136/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22630
  
**[Test build #97136 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97136/testReport)**
 for PR 22630 at commit 
[`4fc4301`](https://github.com/apache/spark/commit/4fc43010c7c466e7a3db6a08c554adc78719db76).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22675
  
**[Test build #97141 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97141/testReport)**
 for PR 22675 at commit 
[`887f528`](https://github.com/apache/spark/commit/887f5282fba8a8a0bcbb9242eb87b27bf94d0210).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3811/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/22675

[SPARK-25347][ML][DOC] Spark datasource for image/libsvm user guide

## What changes were proposed in this pull request?

Spark datasource for image/libsvm user guide

## How was this patch tested?
Scala:

![8ba0b2c3-7a44-4154-8ecf-d001f94d1ca0](https://user-images.githubusercontent.com/19235986/46644290-42cabc00-cbb2-11e8-978e-996f12ef9405.png)
Java:

![4c6ffe71-1268-4890-886f-263f61f25519](https://user-images.githubusercontent.com/19235986/46644298-49f1ca00-cbb2-11e8-9af9-17409a8d29e2.png)
Python:

![2ec95d0f-478b-4bd6-8815-16640bbea5ab](https://user-images.githubusercontent.com/19235986/46644301-4fe7ab00-cbb2-11e8-989f-1280582f5358.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark add_image_source_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22675.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22675


commit 887f5282fba8a8a0bcbb9242eb87b27bf94d0210
Author: WeichenXu 
Date:   2018-10-09T02:54:09Z

init




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-10-08 Thread wangshisan

Github user wangshisan commented on the issue:

https://github.com/apache/spark/pull/21156
  
What is the status now? I think this is of great value, since this gives 
users more possibility to leverage bucket join, all joins which take the bucket 
key as the prefix of join keys will benefit from this. 
And we have a further optimization here:
1. Table A(a1, a2, a3) is bucketed by a1, a2
2. Table B(b1, b2, b3) is bucketed by b1.
3. A join B on (a1=b1, a2=b2, a3=b3)

In this case, only table B needs extra shuffle, and shuffle keys are (b1, 
b2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3809/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3809/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3810/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3809/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22237#discussion_r223545341
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 # Migration Guide
 
+## Upgrading From Spark SQL 2.4 to 3.0
+
+  - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`. In version 2.4 and earlier, arrays of JSON objects 
are considered as invalid and converted to `null` if specified schema is 
`StructType`. Since Spark 3.0, the input is considered as a valid JSON array 
and only its first element is parsed if it conforms to the specified 
`StructType`.
--- End diff --

the last option LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3808/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3808/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22674: [SPARK-25680][SQL] SQL execution listener shouldn't happ...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22674
  
**[Test build #97140 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97140/testReport)**
 for PR 22674 at commit 
[`a456226`](https://github.com/apache/spark/commit/a4562264c45c24a88f4d508c2d34d4e7aed50631).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3808/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-08 Thread shahidki31

Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22659
  
Thank you @srowen for merging.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
**[Test build #97139 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97139/testReport)**
 for PR 21669 at commit 
[`2108154`](https://github.com/apache/spark/commit/210815496cc2e5ecf8a0e7f86572733994e2f671).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
**[Test build #97138 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97138/testReport)**
 for PR 21669 at commit 
[`69840a8`](https://github.com/apache/spark/commit/69840a8026d1f8a72eed9a3e6c7073e8bdbf04b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97135/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #97135 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97135/testReport)**
 for PR 22482 at commit 
[`94e9859`](https://github.com/apache/spark/commit/94e9859b4bef01a9c08da0ce444167dd0d25e3ba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22623
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97134/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22623
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22623
  
**[Test build #97134 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97134/testReport)**
 for PR 22623 at commit 
[`71e3e30`](https://github.com/apache/spark/commit/71e3e30aaf94dfb638eca7e2edfc5feddccc0d3d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22623
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22623
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97133/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22623: [SPARK-25636][CORE] spark-submit cuts off the failure re...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22623
  
**[Test build #97133 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97133/testReport)**
 for PR 22623 at commit 
[`9e60602`](https://github.com/apache/spark/commit/9e6060289c798117d2e2b18d2f73af1e8eb74d3f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread shaneknapp

Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22615
  
@srowen fair 'nuf...  i'll create a jira for this tomorrow and we can hash 
out final design shite there (rather than overloading this PR).  :)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223535529
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -820,4 +820,37 @@ specific to Spark on Kubernetes.
This sets the major Python version of the docker image used to run the 
driver and executor containers. Can either be 2 or 3. 
   
 
+
+  spark.kubernetes.kerberos.krb5.location
+  (none)
+  
+   Specify the local location of the krb5 file to be mounted on the driver 
and executors for Kerberos interaction.
+   It is important to note that for local files, the KDC defined needs to 
be visible from inside the containers.
--- End diff --

True


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22653: [SPARK-25659][PYTHON][TEST] Test type inference s...

2018-10-08 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22653#discussion_r223534303
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1149,6 +1149,75 @@ def test_infer_schema(self):
 result = self.spark.sql("SELECT l[0].a from test2 where d['key'].d 
= '2'")
 self.assertEqual(1, result.head()[0])
 
+def test_infer_schema_specification(self):
+from decimal import Decimal
+
+class A(object):
+def __init__(self):
+self.a = 1
+
+data = [
+True,
+1,
+"a",
+u"a",
--- End diff --

nit: since this is for unicode string, how about using a non-ascii string?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21809
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97130/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21809
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21809: [SPARK-24851][UI] Map a Stage ID to it's Associated Job ...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21809
  
**[Test build #97130 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97130/testReport)**
 for PR 21809 at commit 
[`0be099a`](https://github.com/apache/spark/commit/0be099a81bb016034e73562cc32599bd06810956).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Red...

2018-10-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22659


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

2018-10-08 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22659
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22653: [SPARK-25659][PYTHON][TEST] Test type inference specific...

2018-10-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22653
  
Sorry for late. @HyukjinKwon Yes, this LGTM. It is great we can follow up 
other issues like None and datetime.time in other JIRA.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22635
  
@cloud-fan @gatorsmile @HyukjinKwon Thanks. Yes. As Pandas UDF has the same 
issue and it is fixed by this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22653: [SPARK-25659][PYTHON][TEST] Test type inference s...

2018-10-08 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22653


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22615
  
**[Test build #97137 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97137/testReport)**
 for PR 22615 at commit 
[`7392cf0`](https://github.com/apache/spark/commit/7392cf00c31d48790d1235330c2fe18f7f850624).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22615
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22615
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3807/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22653: [SPARK-25659][PYTHON][TEST] Test type inference specific...

2018-10-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22653
  
Merged to master.

Thank you @BryanCutler. Let me try to make sure I take further actions for 
related items.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22630
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3806/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97128/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22615
  
Yeah this does need to be in a public repo. 
apache/spark-jenkins-configurations or something. We can ask INFRA to create 
them. But, I'm not against just putting them in dev/ or something in the main 
repo. It's not much right? and we already host all the release scripts there 
which maybe 5 people are interested in right now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22630
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22630: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22630
  
**[Test build #97136 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97136/testReport)**
 for PR 22630 at commit 
[`4fc4301`](https://github.com/apache/spark/commit/4fc43010c7c466e7a3db6a08c554adc78719db76).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
**[Test build #97128 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97128/testReport)**
 for PR 21669 at commit 
[`e303048`](https://github.com/apache/spark/commit/e30304810c1090c1eaf69cd0302f1621b1f44644).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223525785
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala
 ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.SecretBuilder
+import org.apache.commons.codec.binary.Base64
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Constants._
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+ /**
+  * This logic does all the heavy lifting for Delegation Token creation. 
This step
+  * assumes that the job user has either specified a principal and keytab 
or ran
+  * $kinit before running spark-submit. By running UGI.getCurrentUser we 
are able
+  * to obtain the current user, either signed in via $kinit or keytab. 
With the
+  * Job User principal you then retrieve the delegation token from the 
NameNode
+  * and store values in DelegationToken. Lastly, the class puts the data 
into
+  * a secret. All this is defined in a KerberosConfigSpec.
+  */
+private[spark] object HadoopKerberosLogin {
+   def buildSpec(
+ submissionSparkConf: SparkConf,
+ kubernetesResourceNamePrefix : String,
+ tokenManager: KubernetesHadoopDelegationTokenManager): 
KerberosConfigSpec = {
+ val hadoopConf = 
SparkHadoopUtil.get.newConfiguration(submissionSparkConf)
+ if (!tokenManager.isSecurityEnabled) {
+   throw new SparkException("Hadoop not configured with Kerberos")
+ }
+ // The JobUserUGI will be taken fom the Local Ticket Cache or via 
keytab+principal
+ // The login happens in the SparkSubmit so login logic is not 
necessary to include
+ val jobUserUGI = tokenManager.getCurrentUser
+ val originalCredentials = jobUserUGI.getCredentials
+ val (tokenData, renewalInterval) = tokenManager.getDelegationTokens(
+   originalCredentials,
+   submissionSparkConf,
+   hadoopConf)
+ require(tokenData.nonEmpty, "Did not obtain any delegation tokens")
+ val currentTime = tokenManager.getCurrentTime
+ val initialTokenDataKeyName = 
s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval"
--- End diff --

> will always need to know the renewal interval

Since the renewal service *does not exist*, why do those values need to be 
in the config name? That is my question.

If the answer is "because the renewal service needs it", it means it should 
be added when the renewal service exists, and should be removed from here. And 
at that time we'll discuss the best way to do it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223525420
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala
 ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.SecretBuilder
+import org.apache.commons.codec.binary.Base64
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Constants._
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+ /**
+  * This logic does all the heavy lifting for Delegation Token creation. 
This step
+  * assumes that the job user has either specified a principal and keytab 
or ran
+  * $kinit before running spark-submit. By running UGI.getCurrentUser we 
are able
+  * to obtain the current user, either signed in via $kinit or keytab. 
With the
+  * Job User principal you then retrieve the delegation token from the 
NameNode
+  * and store values in DelegationToken. Lastly, the class puts the data 
into
+  * a secret. All this is defined in a KerberosConfigSpec.
+  */
+private[spark] object HadoopKerberosLogin {
+   def buildSpec(
+ submissionSparkConf: SparkConf,
+ kubernetesResourceNamePrefix : String,
+ tokenManager: KubernetesHadoopDelegationTokenManager): 
KerberosConfigSpec = {
+ val hadoopConf = 
SparkHadoopUtil.get.newConfiguration(submissionSparkConf)
+ if (!tokenManager.isSecurityEnabled) {
+   throw new SparkException("Hadoop not configured with Kerberos")
+ }
+ // The JobUserUGI will be taken fom the Local Ticket Cache or via 
keytab+principal
+ // The login happens in the SparkSubmit so login logic is not 
necessary to include
+ val jobUserUGI = tokenManager.getCurrentUser
+ val originalCredentials = jobUserUGI.getCredentials
+ val (tokenData, renewalInterval) = tokenManager.getDelegationTokens(
+   originalCredentials,
+   submissionSparkConf,
+   hadoopConf)
+ require(tokenData.nonEmpty, "Did not obtain any delegation tokens")
+ val currentTime = tokenManager.getCurrentTime
+ val initialTokenDataKeyName = 
s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval"
--- End diff --

> why this config name needs to reference those values?

Because, either way, an external renewal service will always need to know 
the renewal interval and current time to calculate when to do renewal.

> I'm sure we'll have discussions about how to make requests to the service 
and propagate any needed configuration.

Agreed, that will be the case in followup PRs



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22630: [SPARK-25497][SQL] Limit operation within whole s...

2018-10-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22630#discussion_r223525444
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -452,46 +452,73 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
 
 val localIdx = ctx.freshName("localIdx")
 val localEnd = ctx.freshName("localEnd")
-val range = ctx.freshName("range")
 val shouldStop = if (parent.needStopCheck) {
-  s"if (shouldStop()) { $number = $value + ${step}L; return; }"
+  s"if (shouldStop()) { $nextIndex = $value + ${step}L; return; }"
 } else {
   "// shouldStop check is eliminated"
 }
+val loopCondition = if (limitNotReachedChecks.isEmpty) {
+  "true"
+} else {
+  limitNotReachedChecks.mkString(" && ")
--- End diff --

This is whole-stage-codege. If bytecode overfolow happens, we will fallback


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22614: [SPARK-25561][SQL] Implement a new config to control par...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22614
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22614: [SPARK-25561][SQL] Implement a new config to control par...

2018-10-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22614
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97126/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22614: [SPARK-25561][SQL] Implement a new config to control par...

2018-10-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22614
  
**[Test build #97126 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97126/testReport)**
 for PR 22614 at commit 
[`01e2123`](https://github.com/apache/spark/commit/01e2123a3ddd0cf0c15d5e79c2beafad821a2cf6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22635
  
Yea, same issue exists in Pandas UDFs too (quickly double checked). This PR 
fixes it. That code path is rather one same place FYI.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread shaneknapp

Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22615
  
@vanzin i'm not opposed to hosting these configs somewhere else.  
@JoshRosen did this a few years back just to "get shit done"...

i'd be leery of putting this in to the main spark repo, however, as only a 
very, very, very small subset of people (consisting mostly of myself) should 
actually ever touch this stuff.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22635
  
@AbdealiJK since RC3 is not cut, this will be in 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223524734
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala
 ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.SecretBuilder
+import org.apache.commons.codec.binary.Base64
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Constants._
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+ /**
+  * This logic does all the heavy lifting for Delegation Token creation. 
This step
+  * assumes that the job user has either specified a principal and keytab 
or ran
+  * $kinit before running spark-submit. By running UGI.getCurrentUser we 
are able
+  * to obtain the current user, either signed in via $kinit or keytab. 
With the
+  * Job User principal you then retrieve the delegation token from the 
NameNode
+  * and store values in DelegationToken. Lastly, the class puts the data 
into
+  * a secret. All this is defined in a KerberosConfigSpec.
+  */
+private[spark] object HadoopKerberosLogin {
+   def buildSpec(
+ submissionSparkConf: SparkConf,
+ kubernetesResourceNamePrefix : String,
+ tokenManager: KubernetesHadoopDelegationTokenManager): 
KerberosConfigSpec = {
+ val hadoopConf = 
SparkHadoopUtil.get.newConfiguration(submissionSparkConf)
+ if (!tokenManager.isSecurityEnabled) {
+   throw new SparkException("Hadoop not configured with Kerberos")
+ }
+ // The JobUserUGI will be taken fom the Local Ticket Cache or via 
keytab+principal
+ // The login happens in the SparkSubmit so login logic is not 
necessary to include
+ val jobUserUGI = tokenManager.getCurrentUser
+ val originalCredentials = jobUserUGI.getCredentials
+ val (tokenData, renewalInterval) = tokenManager.getDelegationTokens(
+   originalCredentials,
+   submissionSparkConf,
+   hadoopConf)
+ require(tokenData.nonEmpty, "Did not obtain any delegation tokens")
+ val currentTime = tokenManager.getCurrentTime
+ val initialTokenDataKeyName = 
s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval"
--- End diff --

Well, since for all practical purposes that renewal service does not exist, 
is there any reason why this config name needs to reference those values?

If at some point it does come into existence, I'm sure we'll have 
discussions about how to make requests to the service and propagate any needed 
configuration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22615
  
Hmmm... just want to raise a possible issue that maybe, just maybe, we 
should be hosting those jenkins configs in a repository that is owned by the 
ASF and writable by all Spark committers.

Or even as a directory under the Spark repo itself (and use them always 
from master).

Just a thought.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223524110
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadooputils/HadoopKerberosLogin.scala
 ---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadooputils
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.SecretBuilder
+import org.apache.commons.codec.binary.Base64
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Constants._
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+
+ /**
+  * This logic does all the heavy lifting for Delegation Token creation. 
This step
+  * assumes that the job user has either specified a principal and keytab 
or ran
+  * $kinit before running spark-submit. By running UGI.getCurrentUser we 
are able
+  * to obtain the current user, either signed in via $kinit or keytab. 
With the
+  * Job User principal you then retrieve the delegation token from the 
NameNode
+  * and store values in DelegationToken. Lastly, the class puts the data 
into
+  * a secret. All this is defined in a KerberosConfigSpec.
+  */
+private[spark] object HadoopKerberosLogin {
+   def buildSpec(
+ submissionSparkConf: SparkConf,
+ kubernetesResourceNamePrefix : String,
+ tokenManager: KubernetesHadoopDelegationTokenManager): 
KerberosConfigSpec = {
+ val hadoopConf = 
SparkHadoopUtil.get.newConfiguration(submissionSparkConf)
+ if (!tokenManager.isSecurityEnabled) {
+   throw new SparkException("Hadoop not configured with Kerberos")
+ }
+ // The JobUserUGI will be taken fom the Local Ticket Cache or via 
keytab+principal
+ // The login happens in the SparkSubmit so login logic is not 
necessary to include
+ val jobUserUGI = tokenManager.getCurrentUser
+ val originalCredentials = jobUserUGI.getCredentials
+ val (tokenData, renewalInterval) = tokenManager.getDelegationTokens(
+   originalCredentials,
+   submissionSparkConf,
+   hadoopConf)
+ require(tokenData.nonEmpty, "Did not obtain any delegation tokens")
+ val currentTime = tokenManager.getCurrentTime
+ val initialTokenDataKeyName = 
s"$KERBEROS_SECRET_LABEL_PREFIX-$currentTime-$renewalInterval"
--- End diff --

We built a renewal service (a micro-service designed based on the design 
doc linked in the description), that used the `currentTime` and 
`renewalInteveral` to know when to update the secrets. It determined whether or 
not to "renew" secrets by using the `refresh-token` label. This was the first 
step in the organization of a separate renewal service, however in our company 
(and other companys' use cases) these renewal services should be arbitrary and 
pluggable. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-08 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r223524047
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import com.google.common.base.Charsets
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model.{ConfigMapBuilder, HasMetadata}
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesUtils, 
SparkPod}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
+import org.apache.spark.deploy.k8s.features.hadooputils._
+import org.apache.spark.internal.Logging
+
+ /**
+  * Runs the necessary Hadoop-based logic based on Kerberos configs and 
the presence of the
+  * HADOOP_CONF_DIR. This runs various bootstrap methods defined in 
HadoopBootstrapUtil.
+  */
+private[spark] class KerberosConfDriverFeatureStep(
+kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
+extends KubernetesFeatureConfigStep with Logging {
+
+private val conf = kubernetesConf.sparkConf
+private val maybePrincipal = 
conf.get(org.apache.spark.internal.config.PRINCIPAL)
+private val maybeKeytab = 
conf.get(org.apache.spark.internal.config.KEYTAB)
+private val maybeExistingSecretName = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME)
+private val maybeExistingSecretItemKey =
+  conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY)
+private val maybeKrb5File =
+  conf.get(KUBERNETES_KERBEROS_KRB5_FILE)
+private val maybeKrb5CMap =
+  conf.get(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP)
+private val kubeTokenManager = kubernetesConf.tokenManager(conf,
+  SparkHadoopUtil.get.newConfiguration(conf))
+private val isKerberosEnabled = kubeTokenManager.isSecurityEnabled
+
+require(maybeKeytab.isEmpty || isKerberosEnabled,
+  "You must enable Kerberos support if you are specifying a Kerberos 
Keytab")
+
+require(maybeExistingSecretName.isEmpty || isKerberosEnabled,
+  "You must enable Kerberos support if you are specifying a Kerberos 
Secret")
+
+   require((maybeKrb5File.isEmpty || maybeKrb5CMap.isEmpty) || 
isKerberosEnabled,
+"You must specify either a krb5 file location or a ConfigMap with a 
krb5 file")
+
+   KubernetesUtils.requireNandDefined(
+ maybeKrb5File,
+ maybeKrb5CMap,
+ "Do not specify both a Krb5 local file and the ConfigMap as the 
creation " +
+   "of an additional ConfigMap, when one is already specified, is 
extraneous")
+
+KubernetesUtils.requireBothOrNeitherDefined(
+  maybeKeytab,
+  maybePrincipal,
+  "If a Kerberos principal is specified you must also specify a 
Kerberos keytab",
+  "If a Kerberos keytab is specified you must also specify a Kerberos 
principal")
+
+KubernetesUtils.requireBothOrNeitherDefined(
+  maybeExistingSecretName,
+  maybeExistingSecretItemKey,
+  "If a secret data item-key where the data of the Kerberos Delegation 
Token is specified" +
+" you must also specify the name of the secret",
+  "If a secret storing a Kerberos Delegation Token is specified you 
must also" +
+" specify the item-key where the data is stored")
+
+require(kubernetesConf.hadoopConfDir.isDefined, "Ensure that 
HADOOP_CONF_DIR is defined")
+private val hadoopConfDir = kubernetesConf.hadoopConfDir.get
+private val hadoopConfigurationFiles = 
HadoopBootstrapUtil.getHadoopConfFiles(hadoopConfDir)
+
+// Either use pre-existing secret or login to create new Secret with 
DT

[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-08 Thread shaneknapp

Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22615
  
https://github.com/databricks/spark-jenkins-configurations/pull/47


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 566 matches

Mail list logo