date:20180801

[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21754
  
> For example, in the test of this pr, it sets 3 in ExchangeCoordinator;

How can this happen? Join has 2 children so `ExchangeCoordinator` can at 
most have 2 exchanges.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21942: [SPARK-24283][ML] Make ml.StandardScaler skip con...

2018-08-01 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/21942#discussion_r207103066
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala ---
@@ -160,15 +160,89 @@ class StandardScalerModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val scaler = new feature.StandardScalerModel(std, mean, $(withStd), 
$(withMean))
-
-// TODO: Make the transformer natively in ml framework to avoid extra 
conversion.
-val transformer: Vector => Vector = v => 
scaler.transform(OldVectors.fromML(v)).asML
+val transformer: Vector => Vector = v => transform(v)
 
 val scale = udf(transformer)
 dataset.withColumn($(outputCol), scale(col($(inputCol
   }
 
+  /**
+   * Since `shift` will be only used in `withMean` branch, we have it as
+   * `lazy val` so it will be evaluated in that branch. Note that we don't
+   * want to create this array multiple times in `transform` function.
+   */
+  private lazy val shift: Array[Double] = mean.toArray
+
+   /**
+* Applies standardization transformation on a vector.
+*
+* @param vector Vector to be standardized.
+* @return Standardized vector. If the std of a column is zero, it will 
return default `0.0`
+* for the column with zero std.
+*/
+  @Since("2.3.0")
+  def transform(vector: Vector): Vector = {
--- End diff --

private[spark]?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-01 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21927#discussion_r207107889
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -340,6 +340,22 @@ class DAGScheduler(
 }
   }
 
+  /**
+   * Check to make sure we don't launch a barrier stage with unsupported 
RDD chain pattern. The
+   * following patterns are not supported:
+   * 1. Ancestor RDDs that have different number of partitions from the 
resulting RDD (eg.
+   * union()/coalesce()/first()/PartitionPruningRDD);
--- End diff --

`coalesce()` is not safe when shuffle is false because it may cause the 
number of tasks doesn't match the number of partitions for the RDD that uses 
barrier mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207107551
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 
AtomicInteger = {
+val barrierEpoch = 
barrierEpochByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new AtomicInteger(0))
+if (barrierEpoch == null) {
+

[GitHub] spark issue #21943: [SPARK-24795][Core][FOLLOWUP] Kill all running tasks whe...

2018-08-01 Thread mengxr

Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/21943
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93927/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #93927 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93927/testReport)**
 for PR 21469 at commit 
[`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21960: [SPARK-23698] Remove unused definitions of long and unic...

2018-08-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21960
  
Please close this and proceed in #20838. I already approved your PR roughly 
a week ago.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21754: [SPARK-24705][SQL] ExchangeCoordinator broken whe...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207106829
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeCoordinatorSuite.scala
 ---
@@ -278,6 +278,25 @@ class ExchangeCoordinatorSuite extends SparkFunSuite 
with BeforeAndAfterAll {
 try f(spark) finally spark.stop()
   }
 
+  def withSparkSession(pairs: (String, String)*)(f: SparkSession => Unit): 
Unit = {
--- End diff --

why do we need it? we can still call the old `withSparkSession` and set 
confs in the body.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21961: Spark 20597

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21961
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21754: [SPARK-24705][SQL] ExchangeCoordinator broken whe...

2018-08-01 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207106352
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -52,6 +52,14 @@ case class ReusedExchangeExec(override val output: 
Seq[Attribute], child: Exchan
   // Ignore this wrapper for canonicalizing.
   override def doCanonicalize(): SparkPlan = child.canonicalized
 
+  override protected def doPrepare(): Unit = {
+child match {
+  case shuffleExchange @ ShuffleExchangeExec(_, _, Some(coordinator)) 
=>
+coordinator.registerExchange(shuffleExchange)
--- End diff --

updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21951
  
LGTM. shall we create a JIRA ticket to apply this to other 
`DeclarativeAggregate`s?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207106350
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 
AtomicInteger = {
+val barrierEpoch = 
barrierEpochByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new AtomicInteger(0))
+if (barrierEpoch == null) {
+

[GitHub] spark issue #21961: Spark 20597

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21961
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21959
  
I am going to review and merge that one soon. It doesn't need to open 
multiple PRs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21961: Spark 20597

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21961
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rednaxelafx

Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/21951
  
LGTM as well. Thanks a lot!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21961: Spark 20597

2018-08-01 Thread Satyajitv

GitHub user Satyajitv opened a pull request:

https://github.com/apache/spark/pull/21961

Spark 20597

## What changes were proposed in this pull request?
   
(Please fill in changes proposed in this fix)
Default topic first checks if TOPIC_OPTION_KEY is provided,
 if TOPIC_OPTION_KEY is provided then
defaulttopic=TOPIC_OPTION_KEY
 else  TOPIC_OPTION_KEY is not provided then
 defaulttopic=PATH_OPTION_KEY
## How was this patch tested?
Have tested the code in local, but would start writing tests once the 
approach is confirmed by @jaceklaskowski, as I am expecting change requests.

PF more details on https://issues.apache.org/jira/browse/SPARK-20597

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Satyajitv/spark SPARK-20597

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21961


commit d4e1ed0c25121ad5bf24cfe137e2ee1bff430c94
Author: Satyajit Vegesna 
Date:   2018-08-02T05:12:57Z

SPARK-20597 KafkaSourceProvider falls back on path as synonym for topic

commit 381e66fa0bdd14b5754d8d81710021714e5fc031
Author: Satyajit Vegesna 
Date:   2018-08-02T05:22:07Z

Added parameters that were mistakenly taken out in previous commit

commit 6dc893a681721b51e61a9df099ae8f2c865c38c1
Author: Satyajit Vegesna 
Date:   2018-08-02T05:24:13Z

Added parameters that were mistakenly taken out in previous commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21961: Spark 20597

2018-08-01 Thread holdensmagicalunicorn

Github user holdensmagicalunicorn commented on the issue:

https://github.com/apache/spark/pull/21961
  
@Satyajitv, thanks! I am a bot who has found some folks who might be able 
to help with the review:@tdas, @zsxwing and @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207106000
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 
AtomicInteger = {
+val barrierEpoch = 
barrierEpochByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new AtomicInteger(0))
+if (barrierEpoch == null) {
+  barrierEpochByStageIdAndAttempt.get((stageId,

[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17185
  
overall LGTM, my major concern is how to do O(1) lookup for the 3 part name


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r207105608
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -71,19 +71,27 @@ trait NamedExpression extends Expression {
* multiple qualifiers, it is possible that there are other possible way 
to refer to this
* attribute.
*/
-  def qualifiedName: String = (qualifier.toSeq :+ name).mkString(".")
+  def qualifiedName: String = {
+if (qualifier.isDefined) {
+  (qualifier.get :+ name).mkString(".")
+} else {
+  name
+}
+  }
 
   /**
* Optional qualifier for the expression.
+   * Qualifier can also contain the fully qualified information, for e.g, 
Sequence of string
+   * containing the database and the table name
*
* For now, since we do not allow using original table name to qualify a 
column name once the
* table is aliased, this can only be:
*
* 1. Empty Seq: when an attribute doesn't have a qualifier,
*e.g. top level attributes aliased in the SELECT clause, or column 
from a LocalRelation.
-   * 2. Single element: either the table name or the alias name of the 
table.
+   * 2. Seq with a Single element: either the table name or the alias name 
of the table.
--- End diff --

3. a seq of 2 elements: database name and table name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207105381
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 
AtomicInteger = {
+val barrierEpoch = 
barrierEpochByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new AtomicInteger(0))
+if (barrierEpoch == null) {
+

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21960: [SPARK-23698] Remove unused definitions of long and unic...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21960
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21754
  
**[Test build #93955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93955/testReport)**
 for PR 21754 at commit 
[`8ae2865`](https://github.com/apache/spark/commit/8ae286560b6fe167fe8deb8f4a30c70cacd2c4a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93936/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1618/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21941
  
**[Test build #93936 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93936/testReport)**
 for PR 21941 at commit 
[`e7d69db`](https://github.com/apache/spark/commit/e7d69db7cd0c23d6ee9012b5f48b17e5aeac8d66).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21960: [SPARK-23698] Remove unused definitions of long and unic...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21960
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21960: [SPARK-23698] Remove unused definitions of long and unic...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21960
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...

2018-08-01 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207105078
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -52,6 +52,14 @@ case class ReusedExchangeExec(override val output: 
Seq[Attribute], child: Exchan
   // Ignore this wrapper for canonicalizing.
   override def doCanonicalize(): SparkPlan = child.canonicalized
 
+  override protected def doPrepare(): Unit = {
+child match {
+  case shuffleExchange @ ShuffleExchangeExec(_, _, Some(coordinator)) 
=>
+coordinator.registerExchange(shuffleExchange)
--- End diff --

sorry to confuse you, but I'm working on the issue only in this pr. 
Probably, the title is obscure, so I'll update soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207105004
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
--- End diff --

This is just to be safe, in case the requests are held in other places, we 
can still GC the `RpcCallContext`s


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21960: [SPARK-23698] Remove unused definitions of long a...

2018-08-01 Thread cclauss

GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/21960

[SPARK-23698] Remove unused definitions of long and unicode

__intlike__ and __unicode__ were defined but not used in the existing code. 
 __basestring()__was removed in Python 3 in favor of __str()__ because all str 
are Unicode.  This simple change resolves an Undefined Name (__long__) and was 
originally suggested in #20838 which is currently mired in 50+ comments on 
unrelated modifications.  @HyukjinKwon @holdenk

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21960


commit 20d072e6e19114e485711b882ff5f48e48a884a1
Author: cclauss 
Date:   2018-08-02T05:24:03Z

[SPARK-23698] Remove unused definitions of long and unicode

__intlike__ and __unicode__ were defined but not used in the existing code. 
 __basestring()__was removed in Python 3 in favor of __str()__ because all str 
are Unicode.  This simple change resolves an Undefined Name (__long__) and was 
originally suggested in #20838 which is currently mired in 50+ comments on 
unrelated modifications.  @HyukjinKwon @holdenk




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21960: [SPARK-23698] Remove unused definitions of long and unic...

2018-08-01 Thread holdensmagicalunicorn

Github user holdensmagicalunicorn commented on the issue:

https://github.com/apache/spark/pull/21960
  
@cclauss, thanks! I am a bot who has found some folks who might be able to 
help with the review:@tdas, @zsxwing and @marmbrus


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread cclauss

Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/21959
  
As it says in the commit message, these changes are already in #20838 but 
that PR has been open for 139 days and has 50+ comments.  The only way that I 
seem to make progress is by opening separate PRs that are easier to review and 
approve.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93920/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21754
  
**[Test build #93954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93954/testReport)**
 for PR 21754 at commit 
[`ad63270`](https://github.com/apache/spark/commit/ad63270cbd55bb22be6df4c15c22fc3a493464f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1617/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21935
  
**[Test build #93920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93920/testReport)**
 for PR 21935 at commit 
[`09ad6e9`](https://github.com/apache/spark/commit/09ad6e9f022740182312b29e20d5ff52778f63ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r207104518
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -262,17 +262,47 @@ abstract class Star extends LeafExpression with 
NamedExpression {
  */
 case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
 
-  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+  /**
+   * Returns true if the nameParts match the qualifier of the attribute
+   *
+   * There are two checks: i) Check if the nameParts match the qualifier 
fully.
+   * E.g. SELECT db.t1.* FROM db1.t1   In this case, the nameParts is 
Seq("db1", "t1") and
+   * qualifier of the attribute is Seq("db1","t1")
+   * ii) If (i) is not true, then check if nameParts is only a single 
element and it
+   * matches the table portion of the qualifier
+   *
+   * E.g. SELECT t1.* FROM db1.t1  In this case nameParts is Seq("t1") and
+   * qualifier is Seq("db1","t1")
+   * SELECT a.* FROM db1.t1 AS a
+   * In this case nameParts is Seq("a") and qualifier for
+   * attribute is Seq("a")
+   */
+  private def matchedQualifier(
+  attribute: Attribute,
+  nameParts: Seq[String],
+  resolver: Resolver): Boolean = {
+val qualifierList = attribute.qualifier.getOrElse(Seq.empty)
+
+// match the qualifiers and nameParts
+val matched = nameParts.corresponds(qualifierList)(resolver) match {
+  case true => true
+  case false if (nameParts.length == 1 && qualifierList.nonEmpty) =>
+// check if it matches the table portion of the qualifier
--- End diff --

IIRC users can do `select structCol.* from tbl`, does it mean the 
`nameParts.head` may not be table name?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r207104432
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -262,17 +262,47 @@ abstract class Star extends LeafExpression with 
NamedExpression {
  */
 case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
 
-  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+  /**
+   * Returns true if the nameParts match the qualifier of the attribute
+   *
+   * There are two checks: i) Check if the nameParts match the qualifier 
fully.
+   * E.g. SELECT db.t1.* FROM db1.t1   In this case, the nameParts is 
Seq("db1", "t1") and
+   * qualifier of the attribute is Seq("db1","t1")
+   * ii) If (i) is not true, then check if nameParts is only a single 
element and it
+   * matches the table portion of the qualifier
+   *
+   * E.g. SELECT t1.* FROM db1.t1  In this case nameParts is Seq("t1") and
+   * qualifier is Seq("db1","t1")
+   * SELECT a.* FROM db1.t1 AS a
+   * In this case nameParts is Seq("a") and qualifier for
+   * attribute is Seq("a")
+   */
+  private def matchedQualifier(
+  attribute: Attribute,
+  nameParts: Seq[String],
+  resolver: Resolver): Boolean = {
+val qualifierList = attribute.qualifier.getOrElse(Seq.empty)
+
+// match the qualifiers and nameParts
+val matched = nameParts.corresponds(qualifierList)(resolver) match {
--- End diff --

it's weird to use pattern match like this, maybe better to write
```
nameParts.corresponds(qualifierList)(resolver) || {
  if (nameParts.length == 1 && qualifierList.nonEmpty) {
resolver(nameParts.head, qualifierList.last)
  } else {
false
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...

2018-08-01 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207104309
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -89,23 +97,42 @@ case class ReuseExchange(conf: SQLConf) extends 
Rule[SparkPlan] {
 if (!conf.exchangeReuseEnabled) {
   return plan
 }
+
 // Build a hash map using schema of exchanges to avoid O(N*N) 
sameResult calls.
 val exchanges = mutable.HashMap[StructType, ArrayBuffer[Exchange]]()
+
+def tryReuseExchange(exchange: Exchange, filterCondition: Exchange => 
Boolean): SparkPlan = {
+  // the exchanges that have same results usually also have same 
schemas (same column names).
+  val sameSchema = exchanges.getOrElseUpdate(exchange.schema, 
ArrayBuffer[Exchange]())
+  val samePlan = sameSchema.filter(filterCondition).find { e =>
+exchange.sameResult(e)
+  }
+  if (samePlan.isDefined) {
+// Keep the output of this exchange, the following plans require 
that to resolve
+// attributes.
+ReusedExchangeExec(exchange.output, samePlan.get)
+  } else {
+sameSchema += exchange
+exchange
+  }
+}
+
 plan.transformUp {
+  // For coordinated exchange
+  case exchange @ ShuffleExchangeExec(_, _, Some(coordinator)) =>
+tryReuseExchange(exchange, {
+  // We can reuse an exchange with the same coordinator only
+  case ShuffleExchangeExec(_, _, Some(c)) => coordinator == c
--- End diff --

I checked again and I found we didn't need this change (`sameResult` has 
already handled this case correctly), so I'll drop this. Sorry to bother you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21951
  
LGTM.

On Thu, Aug 2, 2018 at 1:14 AM Xiao Li  wrote:

> This will simplify the code and improve the readability. We can do the
> same in the other expression.
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21943: [SPARK-24795][Core][FOLLOWUP] Kill all running tasks whe...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21943
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21943: [SPARK-24795][Core][FOLLOWUP] Kill all running tasks whe...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21943
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1616/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r207104014
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -262,17 +262,47 @@ abstract class Star extends LeafExpression with 
NamedExpression {
  */
 case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
 
-  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+  /**
+   * Returns true if the nameParts match the qualifier of the attribute
+   *
+   * There are two checks: i) Check if the nameParts match the qualifier 
fully.
+   * E.g. SELECT db.t1.* FROM db1.t1   In this case, the nameParts is 
Seq("db1", "t1") and
+   * qualifier of the attribute is Seq("db1","t1")
+   * ii) If (i) is not true, then check if nameParts is only a single 
element and it
+   * matches the table portion of the qualifier
+   *
+   * E.g. SELECT t1.* FROM db1.t1  In this case nameParts is Seq("t1") and
+   * qualifier is Seq("db1","t1")
+   * SELECT a.* FROM db1.t1 AS a
+   * In this case nameParts is Seq("a") and qualifier for
+   * attribute is Seq("a")
+   */
+  private def matchedQualifier(
+  attribute: Attribute,
--- End diff --

do we assume the attribute always carry the full qualifier? and i.e. the 
analyzer will always include the current database in the qualifier.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21959
  
I think you can push the changes in 
https://github.com/apache/spark/pull/20838 and close this one. They look 
virtually the same issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...

2018-08-01 Thread ijuma

Github user ijuma commented on the issue:

https://github.com/apache/spark/pull/21955
  
Looks like some test code is using internal Kafka classes that have changed 
or have been removed:

> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/mocks/MockTime.scala:22:
 object Time is not a member of package kafka.utils
> [error] import kafka.utils.Time
> [error]^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/mocks/MockTime.scala:33:
 not found: type Time
> [error] private[kafka010] class MockTime(@volatile private var currentMs: 
Long) extends Time {
> [error]   
  ^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala:83:
 not enough arguments for constructor Log: (dir: java.io.File, config: 
kafka.log.LogConfig, logStartOffset: Long, recoveryPoint: Long, scheduler: 
kafka.utils.Scheduler, brokerTopicStats: kafka.server.BrokerTopicStats, time: 
org.apache.kafka.common.utils.Time, maxProducerIdExpirationMs: Int, 
producerIdExpirationCheckIntervalMs: Int, topicPartition: 
org.apache.kafka.common.TopicPartition, producerStateManager: 
kafka.log.ProducerStateManager, logDirFailureChannel: 
kafka.server.LogDirFailureChannel)kafka.log.Log.
> [error] Unspecified value parameters brokerTopicStats, time, 
maxProducerIdExpirationMs...
> [error] val log = new Log(
> [error]   ^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala:91:
 not found: type ByteBufferMessageSet
> [error]   val msg = new ByteBufferMessageSet(
> [error] ^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala:93:
 not found: type Message
> [error] new Message(v.getBytes, k.getBytes, Message.NoTimestamp, 
Message.CurrentMagicValue))
> [error] ^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala:93:
 not found: value Message
> [error] new Message(v.getBytes, k.getBytes, Message.NoTimestamp, 
Message.CurrentMagicValue))
> [error] ^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala:93:
 not found: value Message
> [error] new Message(v.getBytes, k.getBytes, Message.NoTimestamp, 
Message.CurrentMagicValue))
> [error]  ^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala:99:
 not enough arguments for constructor LogCleaner: (initialConfig: 
kafka.log.CleanerConfig, logDirs: Seq[java.io.File], logs: 
kafka.utils.Pool[org.apache.kafka.common.TopicPartition,kafka.log.Log], 
logDirFailureChannel: kafka.server.LogDirFailureChannel, time: 
org.apache.kafka.common.utils.Time)kafka.log.LogCleaner.
> [error] Unspecified value parameter logDirFailureChannel.
> [error] val cleaner = new LogCleaner(CleanerConfig(), logDirs = 
Array(dir), logs = logs)
> [error]   ^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/mocks/MockScheduler.scala:24:
 object Time is not a member of package kafka.utils
> [error] import kafka.utils.{Scheduler, Time}
> [error]^
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/mocks/MockScheduler.scala:41:
 not found: type Time
> [error] private[kafka010] class MockScheduler(val time: Time) extends 
Scheduler {
> [error] ^


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-08-01 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20146
  
ping @jkbradley @dbtsai shall we consider to include this in 2.4?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21943: [SPARK-24795][Core][FOLLOWUP] Kill all running ta...

2018-08-01 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21943#discussion_r207103696
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala ---
@@ -51,7 +51,7 @@ private[spark] trait TaskScheduler {
   // Submit a sequence of tasks to run.
   def submitTasks(taskSet: TaskSet): Unit
 
-  // Cancel a stage.
+  // Kill all the tasks in a stage and fail the stage and all the jobs 
that depend on the stage.
--- End diff --

Updated comment to note that if the backend doesn't support kill a task 
then the method shall throw UnsupportedOperationException.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19449
  
**[Test build #93953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93953/testReport)**
 for PR 19449 at commit 
[`253bc19`](https://github.com/apache/spark/commit/253bc19af270185e6d419a9ed0261917f84688c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #93952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93952/testReport)**
 for PR 21889 at commit 
[`be71cd7`](https://github.com/apache/spark/commit/be71cd77eec172d207bcbc034635826c2ebba9be).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21959
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21943: [SPARK-24795][Core][FOLLOWUP] Kill all running tasks whe...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21943
  
**[Test build #93951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93951/testReport)**
 for PR 21943 at commit 
[`7cca33f`](https://github.com/apache/spark/commit/7cca33f5f64fa1a03816860299ce124730abfd0a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21959
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21959
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2018-08-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19449
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21951
  
This will simplify the code and improve the readability. We can do the same 
in the other expression. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread holdensmagicalunicorn

Github user holdensmagicalunicorn commented on the issue:

https://github.com/apache/spark/pull/21959
  
@cclauss, thanks! I am a bot who has found some folks who might be able to 
help with the review:@pwendell and @HyukjinKwon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21959: [SPARK-23698] Define xrange() for Python 3 in dum...

2018-08-01 Thread cclauss

GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/21959

[SPARK-23698]  Define xrange() for Python 3 in dumpdata_script.py

__xrange()__ was removed in Python 3 in favor of __range()__.  This simple 
change removes three Undefined Names was originally suggested in #20838 which 
is currently mired in 50+ comments on unrelated modifications.  @HyukjinKwon 
@holdenk

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21959.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21959


commit 15676ab875838cc99f71fab6f8618c886b294b1a
Author: cclauss 
Date:   2018-08-02T05:13:37Z

[SPARK-23698]  Define xrange() for Python 3 in dumpdata_script.py

__xrange()__ was removed in Python 3 in favor of __range()__.  This simple 
change removes three Undefined Names was originally suggested in #20838 which 
is currently mired in 50+ comments on unrelated modifications.  @HyukjinKwon 
@holdenk




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207103198
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -52,6 +52,14 @@ case class ReusedExchangeExec(override val output: 
Seq[Attribute], child: Exchan
   // Ignore this wrapper for canonicalizing.
   override def doCanonicalize(): SparkPlan = child.canonicalized
 
+  override protected def doPrepare(): Unit = {
+child match {
+  case shuffleExchange @ ShuffleExchangeExec(_, _, Some(coordinator)) 
=>
+coordinator.registerExchange(shuffleExchange)
--- End diff --

they are 2 different bugs, aren't they?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...

2018-08-01 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207102873
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -52,6 +52,14 @@ case class ReusedExchangeExec(override val output: 
Seq[Attribute], child: Exchan
   // Ignore this wrapper for canonicalizing.
   override def doCanonicalize(): SparkPlan = child.canonicalized
 
+  override protected def doPrepare(): Unit = {
+child match {
+  case shuffleExchange @ ShuffleExchangeExec(_, _, Some(coordinator)) 
=>
+coordinator.registerExchange(shuffleExchange)
--- End diff --

Is it bad to fix this in this pr?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93948/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21632
  
**[Test build #93948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93948/testReport)**
 for PR 21632 at commit 
[`3189259`](https://github.com/apache/spark/commit/31892591b89da36a936009553a762b8453ff483e).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21954: [SPARK-23908][SQL] Add transform function.

2018-08-01 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21954#discussion_r207102738
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -0,0 +1,325 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import java.util.concurrent.atomic.AtomicReference
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.expressions.codegen.Block._
+import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData}
+import org.apache.spark.sql.types._
+
+/**
+ * A named lambda variable.
+ */
+case class NamedLambdaVariable(
+name: String,
+dataType: DataType,
+nullable: Boolean,
+value: AtomicReference[Any] = new AtomicReference(),
+exprId: ExprId = NamedExpression.newExprId)
+  extends LeafExpression
+  with NamedExpression {
+
+  override def qualifier: Option[String] = None
+
+  override def newInstance(): NamedExpression =
+copy(value = new AtomicReference(), exprId = NamedExpression.newExprId)
+
+  override def toAttribute: Attribute = {
+AttributeReference(name, dataType, nullable, Metadata.empty)(exprId, 
None)
+  }
+
+  override def eval(input: InternalRow): Any = value.get
+
+  override def genCode(ctx: CodegenContext): ExprCode = {
+val suffix = "_lambda_variable_" + exprId.id
+ExprCode(
+  if (nullable) JavaCode.isNullVariable(s"isNull_${name}$suffix") else 
FalseLiteral,
+  JavaCode.variable(s"value_${name}$suffix", dataType))
+  }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+throw new IllegalStateException("NamedLambdaVariable.doGenCode should 
not be called.")
+  }
+
+  override def toString: String = s"lambda $name#${exprId.id}$typeSuffix"
+
+  override def simpleString: String = s"lambda $name#${exprId.id}: 
${dataType.simpleString}"
+}
+
+/**
+ * A lambda function and its arguments. A lambda function can be hidden 
when a user wants to
+ * process an completely independent expression in a 
[[HigherOrderFunction]], the lambda function
+ * and its variables are then only used for internal bookkeeping within 
the higher order function.
+ */
+case class LambdaFunction(
+function: Expression,
+arguments: Seq[NamedExpression],
+hidden: Boolean = false)
+  extends Expression {
+
+  override def children: Seq[Expression] = function +: arguments
+  override def dataType: DataType = function.dataType
+  override def nullable: Boolean = function.nullable
+
+  lazy val bound: Boolean = arguments.forall(_.resolved)
+
+  override def eval(input: InternalRow): Any = function.eval(input)
+
+  override def genCode(ctx: CodegenContext): ExprCode = {
+function.genCode(ctx)
+  }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+throw new IllegalStateException("LambdaFunction.doGenCode should not 
be called.")
+  }
+}
+
+/**
+ * A higher order function takes one or more (lambda) functions and 
applies these to some objects.
+ * The function produces a number of variables which can be consumed by 
some lambda function.
+ */
+trait HigherOrderFunction extends Expression {
+
+  override def children: Seq[Expression] = inputs ++ functions
+
+  /**
+   * Inputs to the higher ordered function.
+   */
+  def inputs: Seq[Expression]
+
+  /**
+   * All inputs have been resolved. This means that the types and 
nullabilty of (most of) the
+   * lambda function arguments is known, and that we can start binding the 
lambda functions.
+   */
+  lazy val inputResolved: Boolean = inputs.forall(_.resolved)
+
+  /**
+

[GitHub] spark pull request #21952: [SPARK-24993] [SQL] [WIP] Make Avro Fast Again

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21952#discussion_r207102304
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -100,13 +100,14 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   et, resolveNullableType(avroType.getElementType, containsNull))
 (getter, ordinal) => {
   val arrayData = getter.getArray(ordinal)
-  val result = new java.util.ArrayList[Any]
+  val len = arrayData.numElements()
+  val result = new Array[Any](len)
--- End diff --

one more improvement: if the element is primitive type, we can call 
`arrayData.toBoolean/Int/...Array` directly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21954
  
**[Test build #93950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93950/testReport)**
 for PR 21954 at commit 
[`c3bf6a0`](https://github.com/apache/spark/commit/c3bf6a0059a151ba23cf32c842e31ced3b28726c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21954
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21954
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1615/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207101919
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -52,6 +52,14 @@ case class ReusedExchangeExec(override val output: 
Seq[Attribute], child: Exchan
   // Ignore this wrapper for canonicalizing.
   override def doCanonicalize(): SparkPlan = child.canonicalized
 
+  override protected def doPrepare(): Unit = {
+child match {
+  case shuffleExchange @ ShuffleExchangeExec(_, _, Some(coordinator)) 
=>
+coordinator.registerExchange(shuffleExchange)
--- End diff --

ah i see, can we send a new PR for the bug fix?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...

2018-08-01 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207101662
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -52,6 +52,14 @@ case class ReusedExchangeExec(override val output: 
Seq[Attribute], child: Exchan
   // Ignore this wrapper for canonicalizing.
   override def doCanonicalize(): SparkPlan = child.canonicalized
 
+  override protected def doPrepare(): Unit = {
+child match {
+  case shuffleExchange @ ShuffleExchangeExec(_, _, Some(coordinator)) 
=>
+coordinator.registerExchange(shuffleExchange)
--- End diff --

The master has the same problem? I checked the query in the master below;
```

Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
  /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.8.0_31)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sql("SET spark.sql.adaptive.enabled=true")
scala> sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
scala> val df = spark.range(1).selectExpr("id AS key", "id AS value")
scala> val resultDf = df.join(df, "key").join(df, "key")
scala> resultDf.show
...
  at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
  at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
  ... 101 more
Caused by: java.lang.AssertionError: assertion failed
  at scala.Predef$.assert(Predef.scala:156)
  at 
org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:201)
  at 
org.apache.spark.sql.execution.exchange.ExchangeCoordinator.postShuffleRDD(ExchangeCoordinator.scala:259)
  at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:124)
  at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
  at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21957
  
shall we normalize the filters before pushing it down? e.g. `Cast(a, Int) > 
100` should be `a > 100` because 100 is within the byte range.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21958
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21958
  
**[Test build #93949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93949/testReport)**
 for PR 21958 at commit 
[`19dd614`](https://github.com/apache/spark/commit/19dd61431326a5a01c6b8f63dc71a4663a9bb590).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21958
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21958
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1614/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21935: [SPARK-24773] Avro: support logical timestamp typ...

2018-08-01 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21935#discussion_r207100964
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala 
---
@@ -86,8 +87,16 @@ class AvroDeserializer(rootAvroType: Schema, 
rootCatalystType: DataType) {
   case (LONG, LongType) => (updater, ordinal, value) =>
 updater.setLong(ordinal, value.asInstanceOf[Long])
 
-  case (LONG, TimestampType) => (updater, ordinal, value) =>
-updater.setLong(ordinal, value.asInstanceOf[Long] * 1000)
+  case (LONG, TimestampType) => avroType.getLogicalType match {
+case _: TimestampMillis => (updater, ordinal, value) =>
+  updater.setLong(ordinal, value.asInstanceOf[Long] * 1000)
+case _: TimestampMicros => (updater, ordinal, value) =>
+  updater.setLong(ordinal, value.asInstanceOf[Long])
+case null => (updater, ordinal, value) =>
--- End diff --

ditto, add a default case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21957
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93946/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21957
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21957
  
**[Test build #93946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93946/testReport)**
 for PR 21957 at commit 
[`7368d0a`](https://github.com/apache/spark/commit/7368d0ad9d790c589dee82d1d144d90fd1d40a70).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-01 Thread holdensmagicalunicorn

Github user holdensmagicalunicorn commented on the issue:

https://github.com/apache/spark/pull/21958
  
@cloud-fan, thanks! I am a bot who has found some folks who might be able 
to help with the review:@ueshin, @rxin and @JoshRosen


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-01 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/21958

[minor] remove dead code in ExpressionEvalHelper

## What changes were proposed in this pull request?

This addresses https://github.com/apache/spark/pull/21236/files#r207078480

both https://github.com/apache/spark/pull/21236 and 
https://github.com/apache/spark/pull/21838 add a InternalRow result check to 
ExpressionEvalHelper and becomes duplicated.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21958


commit 19dd61431326a5a01c6b8f63dc71a4663a9bb590
Author: Wenchen Fan 
Date:   2018-08-02T04:43:31Z

remove dead code in ExpressionEvalHelper




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21958
  
cc @srowen @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21935
  
**[Test build #93947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93947/testReport)**
 for PR 21935 at commit 
[`2b286cd`](https://github.com/apache/spark/commit/2b286cd6b7ed1f42a0611b704d7f31f5018b4c5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21957
  
**[Test build #93946 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93946/testReport)**
 for PR 21957 at commit 
[`7368d0a`](https://github.com/apache/spark/commit/7368d0ad9d790c589dee82d1d144d90fd1d40a70).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21632
  
**[Test build #93948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93948/testReport)**
 for PR 21632 at commit 
[`3189259`](https://github.com/apache/spark/commit/31892591b89da36a936009553a762b8453ff483e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1613/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21957
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1611/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21957
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1612/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21957: [SPARK-24994][SQL] When the data type of the field is co...

2018-08-01 Thread holdensmagicalunicorn

Github user holdensmagicalunicorn commented on the issue:

https://github.com/apache/spark/pull/21957
  
@10110346, thanks! I am a bot who has found some folks who might be able to 
help with the review:@yhuai, @gatorsmile and @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21957: [SPARK-24994][SQL] When the data type of the fiel...

2018-08-01 Thread 10110346

GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/21957

[SPARK-24994][SQL] When the data type of the field is converted to other 
types, it can also support pushdown to parquet

## What changes were proposed in this pull request?
For this statement: select * from table1 where a = 100;
the data type of `a` is `smallint` , because the defaut data type of 100 is 
`int` ,so `a` is converted to `int`.
In this case, it does not support push down to parquet.

In our business, for our SQL statements, and we generally do not convert 
100 to `smallint`, this pr can support push down to parquet for this situation.

## How was this patch tested?
added unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark otherpushdown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21957.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21957


commit 7368d0ad9d790c589dee82d1d144d90fd1d40a70
Author: liuxian 
Date:   2018-08-02T01:51:49Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 890 matches

Mail list logo