date:20161031

[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...

2016-10-31 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15414
  
@jkbradley @sethah I add a comment, thanks for reviews.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15673
  
**[Test build #3382 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3382/consoleFull)**
 for PR 15673 at commit 
[`4c438c8`](https://github.com/apache/spark/commit/4c438c8b2575880379e2a9a872fe07018cb62402).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15708: [SPARK-18167] [SQL] Retry when the SQLQuerySuite test fl...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15708
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread eyalfa

Github user eyalfa commented on the issue:

https://github.com/apache/spark/pull/1
  
@hvanhovell please have a look.
BTW, for some reason Jenkins shows all test cases as 'sql', see 
[here](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67837/testReport/org.apache.spark.sql/SQLQueryTestSuite/)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/1
  
**[Test build #67869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67869/consoleFull)**
 for PR 1 at commit 
[`9b89e31`](https://github.com/apache/spark/commit/9b89e315f83a792d62d02d56f46448d339a705e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15541
  
**[Test build #67863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67863/consoleFull)**
 for PR 15541 at commit 
[`a820e96`](https://github.com/apache/spark/commit/a820e96284f1d9108ef62cd3ef55171ebd47e08f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/15673
  
@rxin I believe https://issues.apache.org/jira/browse/SPARK-18168 will need 
to be resolved before I can rebase this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85877793
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
+if (virtualEnvType == "native") {
+  execCommand(Arrays.asList(virtualPythonExec, "-m", "pip",
+"--cache-dir", System.getProperty("user.home"),
+"install", "-r", pyspark_requirements))
+}
+  }
+
+  def execCommand(commands: java.util.List[String]): Unit = {
+logDebug("Running command:" + commands.asScala.mkString(" "))
+val pb = new ProcessBuilder(commands).inheritIO()
+// pip internally use environment variable `HOME`
+pb.environment().put("HOME", System.getProperty("user.home"))
--- End diff --

For yarn mode, HOME is "/home/" which is not correct. So here I get it from 
system property user.home
launch_container.sh
```
export HOME="/home/"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/15703
  
I can't reproduce those test failures when executing failed test cases 
individually. Seems that it's related to execution order. Still investigating.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15703
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15703
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67842/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15703
  
**[Test build #67842 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67842/consoleFull)**
 for PR 15703 at commit 
[`c0029f1`](https://github.com/apache/spark/commit/c0029f1a529935c263f9c83691cf84921b343e67).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...

2016-10-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15705#discussion_r85859910
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -345,18 +346,32 @@ case class BroadcastHint(child: LogicalPlan) extends 
UnaryNode {
   override lazy val statistics: Statistics = 
super.statistics.copy(isBroadcastable = true)
 }
 
+/**
+ * Options for writing new data into a table.
+ *
+ * @param enabled whether to overwrite existing data in the table.
--- End diff --

it's pretty confusing we call it `enabled`, can we just use `overwrite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15703
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67844/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...

2016-10-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11105#discussion_r85860913
  
--- Diff: 
core/src/test/scala/org/apache/spark/DataPropertyAccumulatorSuite.scala ---
@@ -0,0 +1,361 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import scala.concurrent.ExecutionContext.Implicits.global
+import scala.ref.WeakReference
+
+import org.scalatest.Matchers
+
+import org.apache.spark.scheduler._
+
+
+class DataPropertyAccumulatorSuite extends SparkFunSuite with Matchers 
with LocalSparkContext {
+  test("single partition") {
+sc = new SparkContext("local[2]", "test")
+val acc : Accumulator[Int] = sc.accumulator(0, dataProperty = true)
+
+val a = sc.parallelize(1 to 20, 1)
+val b = a.map{x => acc += x; x}
+b.cache()
+b.count()
+acc.value should be (210)
+  }
+
+  test("adding only the first element per partition should work even if 
partition is empty") {
+sc = new SparkContext("local[2]", "test")
+val acc: Accumulator[Int] = sc.accumulator(0, dataProperty = true)
+val a = sc.parallelize(1 to 20, 30)
+val b = a.mapPartitions{itr =>
+  acc += 1
+  itr
+}
+b.count()
+acc.value should be (30)
+  }
+
+  test("shuffled (combineByKey)") {
+sc = new SparkContext("local[2]", "test")
+val a = sc.parallelize(1 to 40, 5)
+val buckets = 4
+val b = a.map{x => ((x % buckets), x)}
+val inputs = List(b, b.repartition(10), b.partitionBy(new 
HashPartitioner(5))).map(_.cache())
+val mapSideCombines = List(true, false)
+inputs.foreach { input =>
+  mapSideCombines.foreach { mapSideCombine =>
+val accs = (1 to 4).map(x => sc.accumulator(0, dataProperty = 
true)).toList
+val raccs = (1 to 4).map(x => sc.accumulator(0, dataProperty = 
false)).toList
+val List(acc, acc1, acc2, acc3) = accs
+val List(racc, racc1, racc2, racc3) = raccs
+val c = input.combineByKey(
+  (x: Int) => {acc1 += 1; acc += 1; racc1 += 1; racc += 1; x},
+  {(a: Int, b: Int) => acc2 += 1; acc += 1; racc2 += 1; racc += 1; 
(a + b)},
+  {(a: Int, b: Int) => acc3 += 1; acc += 1; racc3 += 1; racc += 1; 
(a + b)},
+  new HashPartitioner(10),
+  mapSideCombine)
+val d = input.combineByKey(
+  (x: Int) => {acc1 += 1; acc += 1; x},
+  {(a: Int, b: Int) => acc2 += 1; acc += 1; (a + b)},
+  {(a: Int, b: Int) => acc3 += 1; acc += 1; (a + b)},
+  new HashPartitioner(2),
+  mapSideCombine)
+val e = d.map{x => acc += 1; x}
+c.count()
+// If our partitioner is known then we should only create
+// one combiner for each key value. Otherwise we should
+// create at least that many combiners.
+if (input.partitioner.isDefined) {
+  acc1.value should be (buckets)
+} else {
+  acc1.value should be >= (buckets)
+}
+if (input.partitioner.isDefined) {
+  acc2.value should be > (0)
+} else if (mapSideCombine) {
+  acc3.value should be > (0)
+} else {
+  acc2.value should be > (0)
+  acc3.value should be (0)
+}
+acc.value should be (acc1.value + acc2.value + acc3.value)
+val oldValues = accs.map(_.value)
+// For one action the data property accumulators and regular 
should have the same value.
+accs.map(_.value) should be (raccs.map(_.value))
+c.count()
+accs.map(_.value) should be (oldValues)
--- End diff --

@squito That is a testing and playing implementation. Seems I don't push it 
to remote and I can not find it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15703
  
**[Test build #67844 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67844/consoleFull)**
 for PR 15703 at commit 
[`5a23a97`](https://github.com/apache/spark/commit/5a23a979c5e6a61f847b146a1cb656418054d955).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15703
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed-delay based Event Time Watermarks

2016-10-31 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15702
  
I'm still trying to find a failure that includes 
https://github.com/apache/spark/pull/15701/files. Until then it's hard to debug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14547
  
**[Test build #67858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67858/consoleFull)**
 for PR 14547 at commit 
[`5f54f4d`](https://github.com/apache/spark/commit/5f54f4dbf94addf8b4df1af13a417f0fd0971633).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...

2016-10-31 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/15673#discussion_r85865327
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -585,7 +586,31 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-getPartitionsByFilterMethod.invoke(hive, table, 
filter).asInstanceOf[JArrayList[Partition]]
+val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
+val tryDirectSql =
+  hive.getConf.getBoolean(tryDirectSqlConfVar.varname, 
tryDirectSqlConfVar.defaultBoolVal)
+try {
+  // Hive may throw an exception when calling this method in some 
circumstances, such as
+  // when filtering on a non-string partition column when the hive 
config key
+  // hive.metastore.try.direct.sql is false
+  getPartitionsByFilterMethod.invoke(hive, table, filter)
+.asInstanceOf[JArrayList[Partition]]
+} catch {
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
+  !tryDirectSql =>
+logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
+  "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
+  "degrade performance. Consider modifying your Hive metastore 
configuration to " +
+  s"set ${tryDirectSqlConfVar.varname} to true.", ex)
+// HiveShim clients are expected to handle a superset of the 
requested partitions
+getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
+  tryDirectSql =>
+throw new RuntimeException("Caught Hive MetaException 
attempting to get partition " +
+  "metadata by filter from Hive. Set the Spark configuration 
setting " +
--- End diff --

I made some revisions. LMK what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...

2016-10-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15705#discussion_r85870248
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -173,12 +175,22 @@ case class DataSourceAnalysis(conf: CatalystConf) 
extends Rule[LogicalPlan] {
 case LogicalRelation(r: HadoopFsRelation, _, _) => 
r.location.rootPaths
   }.flatten
 
-  val mode = if (overwrite) SaveMode.Overwrite else SaveMode.Append
-  if (overwrite && inputPaths.contains(outputPath)) {
+  val mode = if (overwrite.enabled) SaveMode.Overwrite else 
SaveMode.Append
+  if (overwrite.enabled && inputPaths.contains(outputPath)) {
 throw new AnalysisException(
   "Cannot overwrite a path that is also being read from.")
   }
 
+  val overwritePartitionPath = if 
(overwrite.specificPartition.isDefined &&
--- End diff --

can we just pass the partition path as `outputPath` to 
`InsertIntoHadoopFsRelationCommand` and set partition columns to `Nil`, then we 
don't need to add an extra parameter to `InsertIntoHadoopFsRelationCommand`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15675: [SPARK-18144][SQL] logging StreamingQueryListener$QueryS...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15675
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15709
  
Build started: [SparkR] `ALL` 
[![PR-15709](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=B0651ECA-1AB2-4452-89B7-A1BF7652113A=true)](https://ci.appveyor.com/project/spark-test/spark/branch/B0651ECA-1AB2-4452-89B7-A1BF7652113A)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:B0651ECA-1AB2-4452-89B7-A1BF7652113A


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15626
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67845/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15626
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85859164
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -46,6 +50,12 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   val daemonWorkers = new mutable.WeakHashMap[Socket, Int]()
   val idleWorkers = new mutable.Queue[Socket]()
   var lastActivity = 0L
+  val virtualEnvEnabled = 
conf.getBoolean("spark.pyspark.virtualenv.enabled", false)
+  val virtualEnvType = conf.get("spark.pyspark.virtualenv.type", "native")
+  val virtualEnvPath = conf.get("spark.pyspark.virtualenv.bin.path", "")
+  var virtualEnvName: String = _
+  var virtualPythonExec: String = _
+
--- End diff --

Make these private if not required outside of the class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85859942
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
+if (virtualEnvType == "native") {
+  execCommand(Arrays.asList(virtualPythonExec, "-m", "pip",
+"--cache-dir", System.getProperty("user.home"),
+"install", "-r", pyspark_requirements))
+}
+  }
+
+  def execCommand(commands: java.util.List[String]): Unit = {
+logDebug("Running command:" + commands.asScala.mkString(" "))
+val pb = new ProcessBuilder(commands).inheritIO()
+// pip internally use environment variable `HOME`
+pb.environment().put("HOME", System.getProperty("user.home"))
--- End diff --

This should implicitly be propagated, or is it for windows support ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85859906
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
--- End diff --

curious how this works under windows ... not supported ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85872283
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -307,6 +387,7 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 }
 
 private object PythonWorkerFactory {
+  val VIRTUALENV_ID = new AtomicInteger()
--- End diff --

More restrictive acl would be good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15541
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67863/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15541
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15707: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15707


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15633: [SPARK-18087] [SQL] Optimize insert to not require REPAI...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15633
  
**[Test build #3381 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3381/consoleFull)**
 for PR 15633 at commit 
[`4d96725`](https://github.com/apache/spark/commit/4d967251ce01794f7cdab9f84b70fa5393d1d1f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15705
  
**[Test build #67849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67849/consoleFull)**
 for PR 15705 at commit 
[`07c6787`](https://github.com/apache/spark/commit/07c67876c372369def5128ce919cbb74e4f0d30d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Even Time Wate...

2016-10-31 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15702#discussion_r85859357
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java 
---
@@ -252,6 +252,10 @@ public static long parseSecondNano(String secondNano) 
throws IllegalArgumentExce
   public final int months;
   public final long microseconds;
 
+  public final long milliseconds() {
+  return this.microseconds / MICROS_PER_MILLI;
--- End diff --

2 space indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15626
  
**[Test build #67831 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67831/consoleFull)**
 for PR 15626 at commit 
[`d6fec94`](https://github.com/apache/spark/commit/d6fec9464e5a8638f0b9ac5dd1df289c30da132f).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15626
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67831/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15667
  
**[Test build #67853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67853/consoleFull)**
 for PR 15667 at commit 
[`bd22150`](https://github.com/apache/spark/commit/bd22150823ff9ce6a0b80ae61fae6477ad135ef8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67841/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67841 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67841/consoleFull)**
 for PR 15696 at commit 
[`2d7d373`](https://github.com/apache/spark/commit/2d7d373fe48d18037653c10424c8b1c978160958).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread mariusvniekerk

Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85865112
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
--- End diff --

So this change gets the URI for the windows URI which is used later on to 
construct a File instance.  That should allow the windows special case to work. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...

2016-10-31 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15667
  
@ericl Dynamic partition would be more complicated. Should we do it in this 
or in follow-up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15673
  
This looks good to me. cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15697
  
It seems R 3.3.2 is released but R 3.3.1 is not registered in old ones yet 
(in see - https://cloud.r-project.org/bin/windows/base/old). @shivaram and 
@felixcheung Should we should use R 3.3.0 just for safety?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15541
  
Sure will take a look in the next couple of days to get this into 2.1 if 
possible.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15692
  
**[Test build #67862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67862/consoleFull)**
 for PR 15692 at commit 
[`0651bb6`](https://github.com/apache/spark/commit/0651bb6daec336ed221522b59a9149187474cc4b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15692
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67862/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67864/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15705
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15705
  
**[Test build #67864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67864/consoleFull)**
 for PR 15705 at commit 
[`0daff74`](https://github.com/apache/spark/commit/0daff7475e456754538e65b9f324773218f4f943).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15707
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67865/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15707
  
**[Test build #67865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67865/consoleFull)**
 for PR 15707 at commit 
[`65ba5c1`](https://github.com/apache/spark/commit/65ba5c14ec976d79fe9ee118807663496d0b7845).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15707
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15705
  
**[Test build #67848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67848/consoleFull)**
 for PR 15705 at commit 
[`fec7c9e`](https://github.com/apache/spark/commit/fec7c9e9df5fc7ceb1231fa71303fbf5a1a6b3d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15706: [SPARK-18189] [Core] Fix serialization issue in KeyValue...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15706
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15667: [SPARK-18107][SQL] Insert overwrite statement run...

2016-10-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15667#discussion_r85861722
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -257,7 +258,31 @@ case class InsertIntoHiveTable(
 table.catalogTable.identifier.table,
 partitionSpec)
 
+var doOverwrite = overwrite
+
 if (oldPart.isEmpty || !ifNotExists) {
+  // SPARK-18107: Insert overwrite runs much slower than 
hive-client.
+  // Newer Hive largely improves insert overwrite performance. As 
Spark uses older Hive
+  // version and we may not want to catch up new Hive version 
every time. We delete the
+  // Hive partition first and then load data file into the Hive 
partition.
+  if (oldPart.nonEmpty && overwrite) {
+oldPart.get.storage.locationUri.map { uri =>
+  val partitionPath = new Path(uri)
+  val fs = partitionPath.getFileSystem(hadoopConf)
+  if (fs.exists(partitionPath)) {
+val pathPermission = 
fs.getFileStatus(partitionPath).getPermission()
+if (!fs.delete(partitionPath, true)) {
+  throw new RuntimeException(
+"Cannot remove partition directory '" + 
partitionPath.toString)
+} else {
+  fs.mkdirs(partitionPath, pathPermission)
--- End diff --

I was thinking Hive will complain if the dir is not existing. But looks 
like it won't. Let me remove this and see if tests can passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15667: [SPARK-18107][SQL] Insert overwrite statement run...

2016-10-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15667#discussion_r85861794
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -257,7 +258,31 @@ case class InsertIntoHiveTable(
 table.catalogTable.identifier.table,
 partitionSpec)
 
+var doOverwrite = overwrite
--- End diff --

ok. updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15694: [SPARK-18179][SQL] Throws analysis exception with a prop...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15694
  
**[Test build #67856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67856/consoleFull)**
 for PR 15694 at commit 
[`5f09859`](https://github.com/apache/spark/commit/5f0985932ae823635042a1f38258c51a4ae89710).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15697
  
Hm, yes it seems unrelated. I will look into this deeper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15673
  
**[Test build #67859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67859/consoleFull)**
 for PR 15673 at commit 
[`1ed3301`](https://github.com/apache/spark/commit/1ed3301ec4dcbcccde4cacd21909de4f97902e20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15666
  
**[Test build #67860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67860/consoleFull)**
 for PR 15666 at commit 
[`26b39de`](https://github.com/apache/spark/commit/26b39de51f9a76b121ebcb70079072dfcc9972bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15705
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15705
  
**[Test build #67864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67864/consoleFull)**
 for PR 15705 at commit 
[`0daff74`](https://github.com/apache/spark/commit/0daff7475e456754538e65b9f324773218f4f943).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15705
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67849/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15671: [SPARK-14567][ML]Add instrumentation logs to ML t...

2016-10-31 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15671#discussion_r85869367
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 ---
@@ -234,8 +234,14 @@ class MultilayerPerceptronClassifier @Since("1.5.0") (
* @return Fitted model
*/
   override protected def train(dataset: Dataset[_]): 
MultilayerPerceptronClassificationModel = {
+val instr = Instrumentation.create(this, dataset)
+instr.logParams(params : _*)
--- End diff --

ok, I will update it here, and other algos which support a initalModel


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15708: [SPARK-18167] [SQL] Retry when the SQLQuerySuite test fl...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15708
  
**[Test build #67854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67854/consoleFull)**
 for PR 15708 at commit 
[`641337b`](https://github.com/apache/spark/commit/641337bfda465afb385898aa5e09cbe72f41fc06).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67852/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15686: [MINOR][DOC] Remove spaces following slashs

2016-10-31 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15686
  
Never mind, @HyunjinKwon .
I was also curious about AppVoyer failure. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15704
  
**[Test build #67857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67857/consoleFull)**
 for PR 15704 at commit 
[`a3061e2`](https://github.com/apache/spark/commit/a3061e235cd0cf4c20e4480f89e3884b5372f991).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15709
  
**[Test build #67867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67867/consoleFull)**
 for PR 15709 at commit 
[`90fe001`](https://github.com/apache/spark/commit/90fe001145da62391c5a2a9efbdebc201e621e95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15709
  
cc @felixcheung, @shivaram, @srowen and @wangmiao1981 (who I believe met 
this issue first).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #67870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67870/consoleFull)**
 for PR 15172 at commit 
[`daed43c`](https://github.com/apache/spark/commit/daed43c6ee71270adaf57c404adcf41552d01036).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15673
  
**[Test build #67859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67859/consoleFull)**
 for PR 15673 at commit 
[`1ed3301`](https://github.com/apache/spark/commit/1ed3301ec4dcbcccde4cacd21909de4f97902e20).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15707
  
**[Test build #3384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3384/consoleFull)**
 for PR 15707 at commit 
[`0177ded`](https://github.com/apache/spark/commit/0177ded3357a195f48e8e23923b763937ff60cac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HadoopCommitProtocolWrapper(path: String, isAppend: Boolean)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15710
  
**[Test build #3387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3387/consoleFull)**
 for PR 15710 at commit 
[`e9823e7`](https://github.com/apache/spark/commit/e9823e7fc65ab908456b93f5df1e3d54fa8a14dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15673
  
**[Test build #3382 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3382/consoleFull)**
 for PR 15673 at commit 
[`4c438c8`](https://github.com/apache/spark/commit/4c438c8b2575880379e2a9a872fe07018cb62402).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Event Time Wat...

2016-10-31 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15702#discussion_r85859683
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,37 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Defines an event time watermark for this [[Dataset]]. This watermark 
tracks a point in time
--- End diff --

need a tag here for experimental


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...

2016-10-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14803#discussion_r85860069
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -608,6 +614,81 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 
   // === other tests 
 
+  test("read new files in partitioned table without globbing, should read 
partition data") {
+withTempDirs { case (dir, tmp) =>
+  val partitionFooSubDir = new File(dir, "partition=foo")
+  val partitionBarSubDir = new File(dir, "partition=bar")
+
+  val schema = new StructType().add("value", 
StringType).add("partition", StringType)
+  val fileStream = createFileStream("json", 
s"${dir.getCanonicalPath}", Some(schema))
+  val filtered = fileStream.filter($"value" contains "keep")
+  testStream(filtered)(
+// Create new partition=foo sub dir and write to it
+AddTextFileData("{'value': 'drop1'}\n{'value': 'keep2'}", 
partitionFooSubDir, tmp),
+CheckAnswer(("keep2", "foo")),
+
+// Append to same partition=foo sub dir
+AddTextFileData("{'value': 'keep3'}", partitionFooSubDir, tmp),
+CheckAnswer(("keep2", "foo"), ("keep3", "foo")),
+
+// Create new partition sub dir and write to it
+AddTextFileData("{'value': 'keep4'}", partitionBarSubDir, tmp),
+CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar")),
+
+// Append to same partition=bar sub dir
+AddTextFileData("{'value': 'keep5'}", partitionBarSubDir, tmp),
+CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", "bar"), 
("keep5", "bar"))
+  )
+}
+  }
+
+  test("when schema inference is turned on, should read partition data") {
+def createFile(content: String, src: File, tmp: File): Unit = {
+  val tempFile = Utils.tempFileWith(new File(tmp, "text"))
+  val finalFile = new File(src, tempFile.getName)
+  src.mkdirs()
+  require(stringToFile(tempFile, content).renameTo(finalFile))
+}
+
+withSQLConf(SQLConf.STREAMING_SCHEMA_INFERENCE.key -> "true") {
+  withTempDirs { case (dir, tmp) =>
+val partitionFooSubDir = new File(dir, "partition=foo")
+val partitionBarSubDir = new File(dir, "partition=bar")
+
+// Create file in partition, so we can infer the schema.
+createFile("{'value': 'drop0'}", partitionFooSubDir, tmp)
+
+val fileStream = createFileStream("json", 
s"${dir.getCanonicalPath}")
+val filtered = fileStream.filter($"value" contains "keep")
+testStream(filtered)(
+  // Append to same partition=foo sub dir
+  AddTextFileData("{'value': 'drop1'}\n{'value': 'keep2'}", 
partitionFooSubDir, tmp),
+  CheckAnswer(("keep2", "foo")),
+
+  // Append to same partition=foo sub dir
+  AddTextFileData("{'value': 'keep3'}", partitionFooSubDir, tmp),
+  CheckAnswer(("keep2", "foo"), ("keep3", "foo")),
+
+  // Create new partition sub dir and write to it
+  AddTextFileData("{'value': 'keep4'}", partitionBarSubDir, tmp),
+  CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", 
"bar")),
+
+  // Append to same partition=bar sub dir
+  AddTextFileData("{'value': 'keep5'}", partitionBarSubDir, tmp),
+  CheckAnswer(("keep2", "foo"), ("keep3", "foo"), ("keep4", 
"bar"), ("keep5", "bar")),
+
+  // Delete the two partition dirs
+  DeleteFile(partitionFooSubDir),
--- End diff --

@zsxwing I remember it is used to simulate the partition is deleted and 
re-inserted data. Thanks for fixing this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed-delay based Event Time Watermarks

2016-10-31 Thread marmbrus

Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15702
  
@ekl - flaky test...  Should we turn it off for now?

retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...

2016-10-31 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/15673#discussion_r85864458
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -585,7 +586,31 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-getPartitionsByFilterMethod.invoke(hive, table, 
filter).asInstanceOf[JArrayList[Partition]]
+val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
+val tryDirectSql =
+  hive.getConf.getBoolean(tryDirectSqlConfVar.varname, 
tryDirectSqlConfVar.defaultBoolVal)
+try {
+  // Hive may throw an exception when calling this method in some 
circumstances, such as
+  // when filtering on a non-string partition column when the hive 
config key
+  // hive.metastore.try.direct.sql is false
+  getPartitionsByFilterMethod.invoke(hive, table, filter)
+.asInstanceOf[JArrayList[Partition]]
+} catch {
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
+  !tryDirectSql =>
+logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
+  "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
+  "degrade performance. Consider modifying your Hive metastore 
configuration to " +
+  s"set ${tryDirectSqlConfVar.varname} to true.", ex)
+// HiveShim clients are expected to handle a superset of the 
requested partitions
+getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
+  tryDirectSql =>
+throw new RuntimeException("Caught Hive MetaException 
attempting to get partition " +
+  "metadata by filter from Hive. Set the Spark configuration 
setting " +
--- End diff --

Good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15541
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15692: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate'...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15692
  
**[Test build #67862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67862/consoleFull)**
 for PR 15692 at commit 
[`0651bb6`](https://github.com/apache/spark/commit/0651bb6daec336ed221522b59a9149187474cc4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15541
  
**[Test build #67863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67863/consoleFull)**
 for PR 15541 at commit 
[`a820e96`](https://github.com/apache/spark/commit/a820e96284f1d9108ef62cd3ef55171ebd47e08f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15705
  
**[Test build #67849 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67849/consoleFull)**
 for PR 15705 at commit 
[`07c6787`](https://github.com/apache/spark/commit/07c67876c372369def5128ce919cbb74e4f0d30d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15667
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67853/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15667
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15633: [SPARK-18087] [SQL] Optimize insert to not require REPAI...

2016-10-31 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15633
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15667: [SPARK-18107][SQL] Insert overwrite statement runs much ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15667
  
**[Test build #67853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67853/consoleFull)**
 for PR 15667 at commit 
[`bd22150`](https://github.com/apache/spark/commit/bd22150823ff9ce6a0b80ae61fae6477ad135ef8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15414
  
**[Test build #67861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67861/consoleFull)**
 for PR 15414 at commit 
[`810c973`](https://github.com/apache/spark/commit/810c973d7394263a047318d7c0ab82cf6814ee7e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...

2016-10-31 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15705#discussion_r85869751
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala
 ---
@@ -67,7 +67,10 @@ class CatalogFileIndex(
   val selectedPartitions = 
sparkSession.sessionState.catalog.listPartitionsByFilter(
 table.identifier, filters)
   val partitions = selectedPartitions.map { p =>
-PartitionPath(p.toRow(partitionSchema), p.storage.locationUri.get)
+val path = new Path(p.storage.locationUri.get)
+val fs = path.getFileSystem(hadoopConf)
+PartitionPath(
+  p.toRow(partitionSchema), path.makeQualified(fs.getUri, 
fs.getWorkingDirectory))
--- End diff --

why this change? Doesn't `new Path` qualify the path string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15414
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15686: [MINOR][DOC] Remove spaces following slashs

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15686
  
@dongjoon-hyun I am sorry for unrelated comments here. All these comments 
are not related with this PR.

@shivaram Sure, Let me try to create a JIRA. I will cc you. We might be 
able to talk more there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15414
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67861/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67852 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67852/consoleFull)**
 for PR 15696 at commit 
[`cd23d2f`](https://github.com/apache/spark/commit/cd23d2f7bdf7a3ef9b93e77a3ae540d553398267).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15707: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15707#discussion_r85870354
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala
 ---
@@ -133,7 +133,7 @@ object WriteOutput extends Logging {
   sparkAttemptNumber = taskContext.attemptNumber(),
   committer,
   iterator = iter)
-  }).flatten.distinct
+  })
--- End diff --

Move the distinct to updatedPartitions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15593: [SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-10-31 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/15593
  
@sethah I'm recently busy on company work. Will start to work on open 
source code review soon this week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15708: [SPARK-18167] [SQL] Retry when the SQLQuerySuite ...

2016-10-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15708


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

201 - 300 of 572 matches

Mail list logo