[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15414 @jkbradley @sethah I add a comment, thanks for reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15673 **[Test build #3382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3382/consoleFull)** for PR 15673 at commit [`4c438c8`](https://github.com/apache/spark/commit/4c438c8b2575880379e2a9a872fe07018cb62402). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15708: [SPARK-18167] [SQL] Retry when the SQLQuerySuite test fl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15708 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...
Github user eyalfa commented on the issue: https://github.com/apache/spark/pull/1 @hvanhovell please have a look. BTW, for some reason Jenkins shows all test cases as 'sql', see [here](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67837/testReport/org.apache.spark.sql/SQLQueryTestSuite/) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/1 **[Test build #67869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67869/consoleFull)** for PR 1 at commit [`9b89e31`](https://github.com/apache/spark/commit/9b89e315f83a792d62d02d56f46448d339a705e8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15541 **[Test build #67863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67863/consoleFull)** for PR 15541 at commit [`a820e96`](https://github.com/apache/spark/commit/a820e96284f1d9108ef62cd3ef55171ebd47e08f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 @rxin I believe https://issues.apache.org/jira/browse/SPARK-18168 will need to be resolved before I can rebase this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85877793 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String } /** + * Create virtualenv using native virtualenv or conda + * + * Native Virtualenv: + * - Execute command: virtualenv -p pythonExec --no-site-packages virtualenvName + * - Execute command: python -m pip --cache-dir cache-dir install -r requirement_file + * + * Conda + * - Execute command: conda create --prefix prefix --file requirement_file -y + * + */ + def setupVirtualEnv(): Unit = { +logDebug("Start to setup virtualenv...") +logDebug("user.dir=" + System.getProperty("user.dir")) +logDebug("user.home=" + System.getProperty("user.home")) + +require(virtualEnvType == "native" || virtualEnvType == "conda", + s"VirtualEnvType: ${virtualEnvType} is not supported" ) +virtualEnvName = "virtualenv_" + conf.getAppId + "_" + VIRTUALENV_ID.getAndIncrement() +// use the absolute path when it is local mode otherwise just use filename as it would be +// fetched from FileServer +val pyspark_requirements = + if (Utils.isLocalMaster(conf)) { +conf.get("spark.pyspark.virtualenv.requirements") + } else { +conf.get("spark.pyspark.virtualenv.requirements").split("/").last + } + +val createEnvCommand = + if (virtualEnvType == "native") { +Arrays.asList(virtualEnvPath, + "-p", pythonExec, + "--no-site-packages", virtualEnvName) + } else { +Arrays.asList(virtualEnvPath, + "create", "--prefix", System.getProperty("user.dir") + "/" + virtualEnvName, + "--file", pyspark_requirements, "-y") + } +execCommand(createEnvCommand) +// virtualenv will be created in the working directory of Executor. +virtualPythonExec = virtualEnvName + "/bin/python" +if (virtualEnvType == "native") { + execCommand(Arrays.asList(virtualPythonExec, "-m", "pip", +"--cache-dir", System.getProperty("user.home"), +"install", "-r", pyspark_requirements)) +} + } + + def execCommand(commands: java.util.List[String]): Unit = { +logDebug("Running command:" + commands.asScala.mkString(" ")) +val pb = new ProcessBuilder(commands).inheritIO() +// pip internally use environment variable `HOME` +pb.environment().put("HOME", System.getProperty("user.home")) --- End diff -- For yarn mode, HOME is "/home/" which is not correct. So here I get it from system property user.home launch_container.sh ``` export HOME="/home/" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 I can't reproduce those test failures when executing failed test cases individually. Seems that it's related to execution order. Still investigating. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15703 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15703 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67842/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15703 **[Test build #67842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67842/consoleFull)** for PR 15703 at commit [`c0029f1`](https://github.com/apache/spark/commit/c0029f1a529935c263f9c83691cf84921b343e67). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15705#discussion_r85859910 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -345,18 +346,32 @@ case class BroadcastHint(child: LogicalPlan) extends UnaryNode { override lazy val statistics: Statistics = super.statistics.copy(isBroadcastable = true) } +/** + * Options for writing new data into a table. + * + * @param enabled whether to overwrite existing data in the table. --- End diff -- it's pretty confusing we call it `enabled`, can we just use `overwrite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15703 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67844/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11105#discussion_r85860913 --- Diff: core/src/test/scala/org/apache/spark/DataPropertyAccumulatorSuite.scala --- @@ -0,0 +1,361 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import scala.concurrent.ExecutionContext.Implicits.global +import scala.ref.WeakReference + +import org.scalatest.Matchers + +import org.apache.spark.scheduler._ + + +class DataPropertyAccumulatorSuite extends SparkFunSuite with Matchers with LocalSparkContext { + test("single partition") { +sc = new SparkContext("local[2]", "test") +val acc : Accumulator[Int] = sc.accumulator(0, dataProperty = true) + +val a = sc.parallelize(1 to 20, 1) +val b = a.map{x => acc += x; x} +b.cache() +b.count() +acc.value should be (210) + } + + test("adding only the first element per partition should work even if partition is empty") { +sc = new SparkContext("local[2]", "test") +val acc: Accumulator[Int] = sc.accumulator(0, dataProperty = true) +val a = sc.parallelize(1 to 20, 30) +val b = a.mapPartitions{itr => + acc += 1 + itr +} +b.count() +acc.value should be (30) + } + + test("shuffled (combineByKey)") { +sc = new SparkContext("local[2]", "test") +val a = sc.parallelize(1 to 40, 5) +val buckets = 4 +val b = a.map{x => ((x % buckets), x)} +val inputs = List(b, b.repartition(10), b.partitionBy(new HashPartitioner(5))).map(_.cache()) +val mapSideCombines = List(true, false) +inputs.foreach { input => + mapSideCombines.foreach { mapSideCombine => +val accs = (1 to 4).map(x => sc.accumulator(0, dataProperty = true)).toList +val raccs = (1 to 4).map(x => sc.accumulator(0, dataProperty = false)).toList +val List(acc, acc1, acc2, acc3) = accs +val List(racc, racc1, racc2, racc3) = raccs +val c = input.combineByKey( + (x: Int) => {acc1 += 1; acc += 1; racc1 += 1; racc += 1; x}, + {(a: Int, b: Int) => acc2 += 1; acc += 1; racc2 += 1; racc += 1; (a + b)}, + {(a: Int, b: Int) => acc3 += 1; acc += 1; racc3 += 1; racc += 1; (a + b)}, + new HashPartitioner(10), + mapSideCombine) +val d = input.combineByKey( + (x: Int) => {acc1 += 1; acc += 1; x}, + {(a: Int, b: Int) => acc2 += 1; acc += 1; (a + b)}, + {(a: Int, b: Int) => acc3 += 1; acc += 1; (a + b)}, + new HashPartitioner(2), + mapSideCombine) +val e = d.map{x => acc += 1; x} +c.count() +// If our partitioner is known then we should only create +// one combiner for each key value. Otherwise we should +// create at least that many combiners. +if (input.partitioner.isDefined) { + acc1.value should be (buckets) +} else { + acc1.value should be >= (buckets) +} +if (input.partitioner.isDefined) { + acc2.value should be > (0) +} else if (mapSideCombine) { + acc3.value should be > (0) +} else { + acc2.value should be > (0) + acc3.value should be (0) +} +acc.value should be (acc1.value + acc2.value + acc3.value) +val oldValues = accs.map(_.value) +// For one action the data property accumulators and regular should have the same value. +accs.map(_.value) should be (raccs.map(_.value)) +c.count() +accs.map(_.value) should be (oldValues) --- End diff -- @squito That is a testing and playing implementation. Seems I don't push it to remote and I can not find it now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15703 **[Test build #67844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67844/consoleFull)** for PR 15703 at commit [`5a23a97`](https://github.com/apache/spark/commit/5a23a979c5e6a61f847b146a1cb656418054d955). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15703 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed-delay based Event Time Watermarks
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15702 I'm still trying to find a failure that includes https://github.com/apache/spark/pull/15701/files. Until then it's hard to debug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14547 **[Test build #67858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67858/consoleFull)** for PR 14547 at commit [`5f54f4d`](https://github.com/apache/spark/commit/5f54f4dbf94addf8b4df1af13a417f0fd0971633). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15673#discussion_r85865327 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -585,7 +586,31 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] } else { logDebug(s"Hive metastore filter is '$filter'.") -getPartitionsByFilterMethod.invoke(hive, table, filter).asInstanceOf[JArrayList[Partition]] +val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL +val tryDirectSql = + hive.getConf.getBoolean(tryDirectSqlConfVar.varname, tryDirectSqlConfVar.defaultBoolVal) +try { + // Hive may throw an exception when calling this method in some circumstances, such as + // when filtering on a non-string partition column when the hive config key + // hive.metastore.try.direct.sql is false + getPartitionsByFilterMethod.invoke(hive, table, filter) +.asInstanceOf[JArrayList[Partition]] +} catch { + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && + !tryDirectSql => +logWarning("Caught Hive MetaException attempting to get partition metadata by " + + "filter from Hive. Falling back to fetching all partition metadata, which will " + + "degrade performance. Consider modifying your Hive metastore configuration to " + + s"set ${tryDirectSqlConfVar.varname} to true.", ex) +// HiveShim clients are expected to handle a superset of the requested partitions +getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && + tryDirectSql => +throw new RuntimeException("Caught Hive MetaException attempting to get partition " + + "metadata by filter from Hive. Set the Spark configuration setting " + --- End diff -- I made some revisions. LMK what you think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15705#discussion_r85870248 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -173,12 +175,22 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { case LogicalRelation(r: HadoopFsRelation, _, _) => r.location.rootPaths }.flatten - val mode = if (overwrite) SaveMode.Overwrite else SaveMode.Append - if (overwrite && inputPaths.contains(outputPath)) { + val mode = if (overwrite.enabled) SaveMode.Overwrite else SaveMode.Append + if (overwrite.enabled && inputPaths.contains(outputPath)) { throw new AnalysisException( "Cannot overwrite a path that is also being read from.") } + val overwritePartitionPath = if (overwrite.specificPartition.isDefined && --- End diff -- can we just pass the partition path as `outputPath` to `InsertIntoHadoopFsRelationCommand` and set partition columns to `Nil`, then we don't need to add an extra parameter to `InsertIntoHadoopFsRelationCommand` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15675: [SPARK-18144][SQL] logging StreamingQueryListener$QueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15675 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15709 Build started: [SparkR] `ALL` [![PR-15709](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=B0651ECA-1AB2-4452-89B7-A1BF7652113A=true)](https://ci.appveyor.com/project/spark-test/spark/branch/B0651ECA-1AB2-4452-89B7-A1BF7652113A) Diff: https://github.com/apache/spark/compare/master...spark-test:B0651ECA-1AB2-4452-89B7-A1BF7652113A --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15626 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67845/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15626 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85859164 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -46,6 +50,12 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String val daemonWorkers = new mutable.WeakHashMap[Socket, Int]() val idleWorkers = new mutable.Queue[Socket]() var lastActivity = 0L + val virtualEnvEnabled = conf.getBoolean("spark.pyspark.virtualenv.enabled", false) + val virtualEnvType = conf.get("spark.pyspark.virtualenv.type", "native") + val virtualEnvPath = conf.get("spark.pyspark.virtualenv.bin.path", "") + var virtualEnvName: String = _ + var virtualPythonExec: String = _ + --- End diff -- Make these private if not required outside of the class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85859942 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String } /** + * Create virtualenv using native virtualenv or conda + * + * Native Virtualenv: + * - Execute command: virtualenv -p pythonExec --no-site-packages virtualenvName + * - Execute command: python -m pip --cache-dir cache-dir install -r requirement_file + * + * Conda + * - Execute command: conda create --prefix prefix --file requirement_file -y + * + */ + def setupVirtualEnv(): Unit = { +logDebug("Start to setup virtualenv...") +logDebug("user.dir=" + System.getProperty("user.dir")) +logDebug("user.home=" + System.getProperty("user.home")) + +require(virtualEnvType == "native" || virtualEnvType == "conda", + s"VirtualEnvType: ${virtualEnvType} is not supported" ) +virtualEnvName = "virtualenv_" + conf.getAppId + "_" + VIRTUALENV_ID.getAndIncrement() +// use the absolute path when it is local mode otherwise just use filename as it would be +// fetched from FileServer +val pyspark_requirements = + if (Utils.isLocalMaster(conf)) { +conf.get("spark.pyspark.virtualenv.requirements") + } else { +conf.get("spark.pyspark.virtualenv.requirements").split("/").last + } + +val createEnvCommand = + if (virtualEnvType == "native") { +Arrays.asList(virtualEnvPath, + "-p", pythonExec, + "--no-site-packages", virtualEnvName) + } else { +Arrays.asList(virtualEnvPath, + "create", "--prefix", System.getProperty("user.dir") + "/" + virtualEnvName, + "--file", pyspark_requirements, "-y") + } +execCommand(createEnvCommand) +// virtualenv will be created in the working directory of Executor. +virtualPythonExec = virtualEnvName + "/bin/python" +if (virtualEnvType == "native") { + execCommand(Arrays.asList(virtualPythonExec, "-m", "pip", +"--cache-dir", System.getProperty("user.home"), +"install", "-r", pyspark_requirements)) +} + } + + def execCommand(commands: java.util.List[String]): Unit = { +logDebug("Running command:" + commands.asScala.mkString(" ")) +val pb = new ProcessBuilder(commands).inheritIO() +// pip internally use environment variable `HOME` +pb.environment().put("HOME", System.getProperty("user.home")) --- End diff -- This should implicitly be propagated, or is it for windows support ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85859906 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String } /** + * Create virtualenv using native virtualenv or conda + * + * Native Virtualenv: + * - Execute command: virtualenv -p pythonExec --no-site-packages virtualenvName + * - Execute command: python -m pip --cache-dir cache-dir install -r requirement_file + * + * Conda + * - Execute command: conda create --prefix prefix --file requirement_file -y + * + */ + def setupVirtualEnv(): Unit = { +logDebug("Start to setup virtualenv...") +logDebug("user.dir=" + System.getProperty("user.dir")) +logDebug("user.home=" + System.getProperty("user.home")) + +require(virtualEnvType == "native" || virtualEnvType == "conda", + s"VirtualEnvType: ${virtualEnvType} is not supported" ) +virtualEnvName = "virtualenv_" + conf.getAppId + "_" + VIRTUALENV_ID.getAndIncrement() +// use the absolute path when it is local mode otherwise just use filename as it would be +// fetched from FileServer +val pyspark_requirements = + if (Utils.isLocalMaster(conf)) { +conf.get("spark.pyspark.virtualenv.requirements") + } else { +conf.get("spark.pyspark.virtualenv.requirements").split("/").last + } + +val createEnvCommand = + if (virtualEnvType == "native") { +Arrays.asList(virtualEnvPath, + "-p", pythonExec, + "--no-site-packages", virtualEnvName) + } else { +Arrays.asList(virtualEnvPath, + "create", "--prefix", System.getProperty("user.dir") + "/" + virtualEnvName, + "--file", pyspark_requirements, "-y") + } +execCommand(createEnvCommand) +// virtualenv will be created in the working directory of Executor. +virtualPythonExec = virtualEnvName + "/bin/python" --- End diff -- curious how this works under windows ... not supported ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85872283 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -307,6 +387,7 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String } private object PythonWorkerFactory { + val VIRTUALENV_ID = new AtomicInteger() --- End diff -- More restrictive acl would be good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15541 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67863/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15541 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15707: [SPARK-18024][SQL] Introduce an internal commit p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15707 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15633: [SPARK-18087] [SQL] Optimize insert to not require REPAI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15633 **[Test build #3381 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3381/consoleFull)** for PR 15633 at commit [`4d96725`](https://github.com/apache/spark/commit/4d967251ce01794f7cdab9f84b70fa5393d1d1f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15705 **[Test build #67849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67849/consoleFull)** for PR 15705 at commit [`07c6787`](https://github.com/apache/spark/commit/07c67876c372369def5128ce919cbb74e4f0d30d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Even Time Wate...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r85859357 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java --- @@ -252,6 +252,10 @@ public static long parseSecondNano(String secondNano) throws IllegalArgumentExce public final int months; public final long microseconds; + public final long milliseconds() { + return this.microseconds / MICROS_PER_MILLI; --- End diff -- 2 space indent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15626 **[Test build #67831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67831/consoleFull)** for PR 15626 at commit [`d6fec94`](https://github.com/apache/spark/commit/d6fec9464e5a8638f0b9ac5dd1df289c30da132f). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15626 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67831/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15667 **[Test build #67853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67853/consoleFull)** for PR 15667 at commit [`bd22150`](https://github.com/apache/spark/commit/bd22150823ff9ce6a0b80ae61fae6477ad135ef8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67841/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67841/consoleFull)** for PR 15696 at commit [`2d7d373`](https://github.com/apache/spark/commit/2d7d373fe48d18037653c10424c8b1c978160958). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85865112 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { if (path == null) { logWarning("null specified as parameter to addJar") } else { var key = "" - if (path.contains("\\")) { + + val uri = if (path.contains("\\")) { // For local paths with backslashes on Windows, URI throws an exception -key = env.rpcEnv.fileServer.addJar(new File(path)) --- End diff -- So this change gets the URI for the windows URI which is used later on to construct a File instance. That should allow the windows special case to work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15667 @ericl Dynamic partition would be more complicated. Should we do it in this or in follow-up? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15673 This looks good to me. cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15697 It seems R 3.3.2 is released but R 3.3.1 is not registered in old ones yet (in see - https://cloud.r-project.org/bin/windows/base/old). @shivaram and @felixcheung Should we should use R 3.3.0 just for safety? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15541 Sure will take a look in the next couple of days to get this into 2.1 if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15692 **[Test build #67862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67862/consoleFull)** for PR 15692 at commit [`0651bb6`](https://github.com/apache/spark/commit/0651bb6daec336ed221522b59a9149187474cc4b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15692 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15692 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67862/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67864/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15705 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15705 **[Test build #67864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67864/consoleFull)** for PR 15705 at commit [`0daff74`](https://github.com/apache/spark/commit/0daff7475e456754538e65b9f324773218f4f943). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15707 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67865/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15707 **[Test build #67865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67865/consoleFull)** for PR 15707 at commit [`65ba5c1`](https://github.com/apache/spark/commit/65ba5c14ec976d79fe9ee118807663496d0b7845). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15707 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15705 **[Test build #67848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67848/consoleFull)** for PR 15705 at commit [`fec7c9e`](https://github.com/apache/spark/commit/fec7c9e9df5fc7ceb1231fa71303fbf5a1a6b3d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15706: [SPARK-18189] [Core] Fix serialization issue in KeyValue...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15706 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15667: [SPARK-18107][SQL] Insert overwrite statement run...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15667#discussion_r85861722 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -257,7 +258,31 @@ case class InsertIntoHiveTable( table.catalogTable.identifier.table, partitionSpec) +var doOverwrite = overwrite + if (oldPart.isEmpty || !ifNotExists) { + // SPARK-18107: Insert overwrite runs much slower than hive-client. + // Newer Hive largely improves insert overwrite performance. As Spark uses older Hive + // version and we may not want to catch up new Hive version every time. We delete the + // Hive partition first and then load data file into the Hive partition. + if (oldPart.nonEmpty && overwrite) { +oldPart.get.storage.locationUri.map { uri => + val partitionPath = new Path(uri) + val fs = partitionPath.getFileSystem(hadoopConf) + if (fs.exists(partitionPath)) { +val pathPermission = fs.getFileStatus(partitionPath).getPermission() +if (!fs.delete(partitionPath, true)) { + throw new RuntimeException( +"Cannot remove partition directory '" + partitionPath.toString) +} else { + fs.mkdirs(partitionPath, pathPermission) --- End diff -- I was thinking Hive will complain if the dir is not existing. But looks like it won't. Let me remove this and see if tests can passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15667: [SPARK-18107][SQL] Insert overwrite statement run...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15667#discussion_r85861794 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -257,7 +258,31 @@ case class InsertIntoHiveTable( table.catalogTable.identifier.table, partitionSpec) +var doOverwrite = overwrite --- End diff -- ok. updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15694: [SPARK-18179][SQL] Throws analysis exception with a prop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15694 **[Test build #67856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67856/consoleFull)** for PR 15694 at commit [`5f09859`](https://github.com/apache/spark/commit/5f0985932ae823635042a1f38258c51a4ae89710). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15697 Hm, yes it seems unrelated. I will look into this deeper. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15673 **[Test build #67859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67859/consoleFull)** for PR 15673 at commit [`1ed3301`](https://github.com/apache/spark/commit/1ed3301ec4dcbcccde4cacd21909de4f97902e20). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15666 **[Test build #67860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67860/consoleFull)** for PR 15666 at commit [`26b39de`](https://github.com/apache/spark/commit/26b39de51f9a76b121ebcb70079072dfcc9972bd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15705 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15705 **[Test build #67864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67864/consoleFull)** for PR 15705 at commit [`0daff74`](https://github.com/apache/spark/commit/0daff7475e456754538e65b9f324773218f4f943). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15705 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67849/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15671: [SPARK-14567][ML]Add instrumentation logs to ML t...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r85869367 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala --- @@ -234,8 +234,14 @@ class MultilayerPerceptronClassifier @Since("1.5.0") ( * @return Fitted model */ override protected def train(dataset: Dataset[_]): MultilayerPerceptronClassificationModel = { +val instr = Instrumentation.create(this, dataset) +instr.logParams(params : _*) --- End diff -- ok, I will update it here, and other algos which support a initalModel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15708: [SPARK-18167] [SQL] Retry when the SQLQuerySuite test fl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15708 **[Test build #67854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67854/consoleFull)** for PR 15708 at commit [`641337b`](https://github.com/apache/spark/commit/641337bfda465afb385898aa5e09cbe72f41fc06). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67852/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15686: [MINOR][DOC] Remove spaces following slashs
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15686 Never mind, @HyunjinKwon . I was also curious about AppVoyer failure. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15704 **[Test build #67857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67857/consoleFull)** for PR 15704 at commit [`a3061e2`](https://github.com/apache/spark/commit/a3061e235cd0cf4c20e4480f89e3884b5372f991). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15709 **[Test build #67867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67867/consoleFull)** for PR 15709 at commit [`90fe001`](https://github.com/apache/spark/commit/90fe001145da62391c5a2a9efbdebc201e621e95). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15709 cc @felixcheung, @shivaram, @srowen and @wangmiao1981 (who I believe met this issue first). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15172 **[Test build #67870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67870/consoleFull)** for PR 15172 at commit [`daed43c`](https://github.com/apache/spark/commit/daed43c6ee71270adaf57c404adcf41552d01036). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15673 **[Test build #67859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67859/consoleFull)** for PR 15673 at commit [`1ed3301`](https://github.com/apache/spark/commit/1ed3301ec4dcbcccde4cacd21909de4f97902e20). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15707 **[Test build #3384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3384/consoleFull)** for PR 15707 at commit [`0177ded`](https://github.com/apache/spark/commit/0177ded3357a195f48e8e23923b763937ff60cac). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HadoopCommitProtocolWrapper(path: String, isAppend: Boolean)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15710 **[Test build #3387 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3387/consoleFull)** for PR 15710 at commit [`e9823e7`](https://github.com/apache/spark/commit/e9823e7fc65ab908456b93f5df1e3d54fa8a14dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15673 **[Test build #3382 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3382/consoleFull)** for PR 15673 at commit [`4c438c8`](https://github.com/apache/spark/commit/4c438c8b2575880379e2a9a872fe07018cb62402). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Event Time Wat...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r85859683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -536,6 +535,37 @@ class Dataset[T] private[sql]( } /** + * Defines an event time watermark for this [[Dataset]]. This watermark tracks a point in time --- End diff -- need a tag here for experimental --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14803#discussion_r85860069 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -608,6 +614,81 @@ class FileStreamSourceSuite extends FileStreamSourceTest { // === other tests + test("read new files in partitioned table without globbing, should read partition data") { +withTempDirs { case (dir, tmp) => + val partitionFooSubDir = new File(dir, "partition=foo") + val partitionBarSubDir = new File(dir, "partition=bar") + + val schema = new StructType().add("value", StringType).add("partition", StringType) + val fileStream = createFileStream("json", s"${dir.getCanonicalPath}", Some(schema)) + val filtered = fileStream.filter($"value" contains "keep") + testStream(filtered)( +// Create new partition=foo sub dir and write to it +AddTextFileData("{'value': 'drop1'}\n{'value': 'keep2'}", partitionFooSubDir, tmp), +CheckAnswer(("keep2", "foo")), + +// Append to same partition=foo sub dir +AddTextFileData("{'value': 'keep3'}", partitionFooSubDir, tmp), +CheckAnswer(("keep2", "foo"), ("keep3", "foo")), + +// Create new partition sub dir and write to it +AddTextFileData("{'value': 'keep4'}", partitionBarSubDir, tmp), +CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar")), + +// Append to same partition=bar sub dir +AddTextFileData("{'value': 'keep5'}", partitionBarSubDir, tmp), +CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar"), ("keep5", "bar")) + ) +} + } + + test("when schema inference is turned on, should read partition data") { +def createFile(content: String, src: File, tmp: File): Unit = { + val tempFile = Utils.tempFileWith(new File(tmp, "text")) + val finalFile = new File(src, tempFile.getName) + src.mkdirs() + require(stringToFile(tempFile, content).renameTo(finalFile)) +} + +withSQLConf(SQLConf.STREAMING_SCHEMA_INFERENCE.key -> "true") { + withTempDirs { case (dir, tmp) => +val partitionFooSubDir = new File(dir, "partition=foo") +val partitionBarSubDir = new File(dir, "partition=bar") + +// Create file in partition, so we can infer the schema. +createFile("{'value': 'drop0'}", partitionFooSubDir, tmp) + +val fileStream = createFileStream("json", s"${dir.getCanonicalPath}") +val filtered = fileStream.filter($"value" contains "keep") +testStream(filtered)( + // Append to same partition=foo sub dir + AddTextFileData("{'value': 'drop1'}\n{'value': 'keep2'}", partitionFooSubDir, tmp), + CheckAnswer(("keep2", "foo")), + + // Append to same partition=foo sub dir + AddTextFileData("{'value': 'keep3'}", partitionFooSubDir, tmp), + CheckAnswer(("keep2", "foo"), ("keep3", "foo")), + + // Create new partition sub dir and write to it + AddTextFileData("{'value': 'keep4'}", partitionBarSubDir, tmp), + CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar")), + + // Append to same partition=bar sub dir + AddTextFileData("{'value': 'keep5'}", partitionBarSubDir, tmp), + CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar"), ("keep5", "bar")), + + // Delete the two partition dirs + DeleteFile(partitionFooSubDir), --- End diff -- @zsxwing I remember it is used to simulate the partition is deleted and re-inserted data. Thanks for fixing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed-delay based Event Time Watermarks
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15702 @ekl - flaky test... Should we turn it off for now? retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15673#discussion_r85864458 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -585,7 +586,31 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] } else { logDebug(s"Hive metastore filter is '$filter'.") -getPartitionsByFilterMethod.invoke(hive, table, filter).asInstanceOf[JArrayList[Partition]] +val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL +val tryDirectSql = + hive.getConf.getBoolean(tryDirectSqlConfVar.varname, tryDirectSqlConfVar.defaultBoolVal) +try { + // Hive may throw an exception when calling this method in some circumstances, such as + // when filtering on a non-string partition column when the hive config key + // hive.metastore.try.direct.sql is false + getPartitionsByFilterMethod.invoke(hive, table, filter) +.asInstanceOf[JArrayList[Partition]] +} catch { + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && + !tryDirectSql => +logWarning("Caught Hive MetaException attempting to get partition metadata by " + + "filter from Hive. Falling back to fetching all partition metadata, which will " + + "degrade performance. Consider modifying your Hive metastore configuration to " + + s"set ${tryDirectSqlConfVar.varname} to true.", ex) +// HiveShim clients are expected to handle a superset of the requested partitions +getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && + tryDirectSql => +throw new RuntimeException("Caught Hive MetaException attempting to get partition " + + "metadata by filter from Hive. Set the Spark configuration setting " + --- End diff -- Good point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15541 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15692 **[Test build #67862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67862/consoleFull)** for PR 15692 at commit [`0651bb6`](https://github.com/apache/spark/commit/0651bb6daec336ed221522b59a9149187474cc4b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15541 **[Test build #67863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67863/consoleFull)** for PR 15541 at commit [`a820e96`](https://github.com/apache/spark/commit/a820e96284f1d9108ef62cd3ef55171ebd47e08f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15705 **[Test build #67849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67849/consoleFull)** for PR 15705 at commit [`07c6787`](https://github.com/apache/spark/commit/07c67876c372369def5128ce919cbb74e4f0d30d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15667 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67853/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15667 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15633: [SPARK-18087] [SQL] Optimize insert to not require REPAI...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15633 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15667 **[Test build #67853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67853/consoleFull)** for PR 15667 at commit [`bd22150`](https://github.com/apache/spark/commit/bd22150823ff9ce6a0b80ae61fae6477ad135ef8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15414 **[Test build #67861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67861/consoleFull)** for PR 15414 at commit [`810c973`](https://github.com/apache/spark/commit/810c973d7394263a047318d7c0ab82cf6814ee7e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15705#discussion_r85869751 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala --- @@ -67,7 +67,10 @@ class CatalogFileIndex( val selectedPartitions = sparkSession.sessionState.catalog.listPartitionsByFilter( table.identifier, filters) val partitions = selectedPartitions.map { p => -PartitionPath(p.toRow(partitionSchema), p.storage.locationUri.get) +val path = new Path(p.storage.locationUri.get) +val fs = path.getFileSystem(hadoopConf) +PartitionPath( + p.toRow(partitionSchema), path.makeQualified(fs.getUri, fs.getWorkingDirectory)) --- End diff -- why this change? Doesn't `new Path` qualify the path string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15414 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15686: [MINOR][DOC] Remove spaces following slashs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15686 @dongjoon-hyun I am sorry for unrelated comments here. All these comments are not related with this PR. @shivaram Sure, Let me try to create a JIRA. I will cc you. We might be able to talk more there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15414 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67861/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67852 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67852/consoleFull)** for PR 15696 at commit [`cd23d2f`](https://github.com/apache/spark/commit/cd23d2f7bdf7a3ef9b93e77a3ae540d553398267). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15707: [SPARK-18024][SQL] Introduce an internal commit p...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/15707#discussion_r85870354 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala --- @@ -133,7 +133,7 @@ object WriteOutput extends Logging { sparkAttemptNumber = taskContext.attemptNumber(), committer, iterator = iter) - }).flatten.distinct + }) --- End diff -- Move the distinct to updatedPartitions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15593: [SPARK-18060][ML] Avoid unnecessary computation for MLOR
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/15593 @sethah I'm recently busy on company work. Will start to work on open source code review soon this week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15708: [SPARK-18167] [SQL] Retry when the SQLQuerySuite ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15708 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org