[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22450
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3258/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19045
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3259/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96288/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #96288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96288/testReport)**
 for PR 22460 at commit 
[`7252653`](https://github.com/apache/spark/commit/7252653cb836780a12f26b82b5a27a0bc83ee171).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19045
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22450
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22465
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3256/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22465
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19045
  
**[Test build #96290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96290/testReport)**
 for PR 19045 at commit 
[`4e6572f`](https://github.com/apache/spark/commit/4e6572f8a7798298fe4787fe5913ee94c2b97359).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22450
  
**[Test build #96289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96289/testReport)**
 for PR 22450 at commit 
[`27a9ea6`](https://github.com/apache/spark/commit/27a9ea656428d4705c6323deda64b14eb7ced7a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22469
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3255/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #96288 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96288/testReport)**
 for PR 22460 at commit 
[`7252653`](https://github.com/apache/spark/commit/7252653cb836780a12f26b82b5a27a0bc83ee171).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22469
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22469
  
**[Test build #96286 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96286/testReport)**
 for PR 22469 at commit 
[`d444073`](https://github.com/apache/spark/commit/d444073ec047f157e2ffb55edb0577963bd76f40).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22465
  
**[Test build #96287 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96287/testReport)**
 for PR 22465 at commit 
[`3f045c0`](https://github.com/apache/spark/commit/3f045c0512efe968aaf40861b5054af5da254ce3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22450
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-19 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/22010#discussion_r218946996
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala 
---
@@ -35,16 +35,24 @@ import org.apache.spark.{Partition, TaskContext}
  * @param isOrderSensitive whether or not the function is order-sensitive. 
If it's order
  * sensitive, it may return totally different 
result when the input order
  * is changed. Mostly stateful functions are 
order-sensitive.
+ * @param knownPartitioner If the result has a known partitioner.
  */
 private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag](
 var prev: RDD[T],
 f: (TaskContext, Int, Iterator[T]) => Iterator[U],  // (TaskContext, 
partition index, iterator)
 preservesPartitioning: Boolean = false,
 isFromBarrier: Boolean = false,
-isOrderSensitive: Boolean = false)
+isOrderSensitive: Boolean = false,
+knownPartitioner: Option[Partitioner] = None)
   extends RDD[U](prev) {
 
-  override val partitioner = if (preservesPartitioning) 
firstParent[T].partitioner else None
+  override val partitioner = {
+if (preservesPartitioning) {
+  firstParent[T].partitioner
+} else {
+  knownPartitioner
+}
+  }
--- End diff --

yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...

2018-09-19 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22469
  
thank you for pointing this out @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-19 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/22010#discussion_r218946895
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -396,7 +396,26 @@ abstract class RDD[T: ClassTag](
* Return a new RDD containing the distinct elements in this RDD.
*/
   def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): 
RDD[T] = withScope {
-map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)
+partitioner match {
+  case Some(p) if numPartitions == partitions.length =>
+def key(x: T): (T, Null) = (x, null)
+val cleanKey = sc.clean(key _)
+val keyed = new MapPartitionsRDD[(T, Null), T](
+  this,
+  (context, pid, iter) => iter.map(cleanKey),
+  knownPartitioner = Some(new WrappedPartitioner(p)))
+val duplicatesRemoved = keyed.reduceByKey((x, y) => x)
--- End diff --

Yes, would use right partitioner in this case


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22450
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22450
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96254/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22450
  
**[Test build #96254 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96254/testReport)**
 for PR 22450 at commit 
[`27a9ea6`](https://github.com/apache/spark/commit/27a9ea656428d4705c6323deda64b14eb7ced7a4).
 * This patch **fails from timeout after a configured wait of `400m`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...

2018-09-19 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22465
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...

2018-09-19 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22470
  
@cloud-fan I think the main problem about this (and it is the reason why I 
haven't proposed it) is that the range of operations supported would be 
smaller, so we may forbid operations which now can happen. For instance, the 
following code has been always working on Spark: `lit(BigDecimal(1e36)) * 
lit(BigDecimal(1))`. Indeed now this would become a `decimal(6, -36)`. With 
your change, this is going to be a `decimal(42, 0)` which is out of the range 
of supported values (ie. an overflow would occur).

I am not sure if any user has something like this, but it is possible and I 
think we cannot exclude it. We may, though, restrict again the condition when 
it happens, ie. in case we are just parsing from a literal we can avoid 
returning a negative scale. But the other fix would be needed anyway in this 
case, as we could still have to deal with negative scales, so IMO this would be 
quite useless.

I'd agree though about forbidding negative scale in 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96270/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22460
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22460: DO NOT MERGE

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22460
  
**[Test build #96270 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96270/testReport)**
 for PR 22460 at commit 
[`89dcafe`](https://github.com/apache/spark/commit/89dcafe979a150f8d722fad72a575b67152ccb58).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...

2018-09-19 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22325#discussion_r218944816
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -444,36 +444,34 @@ final class ShuffleBlockFetcherIterator(
   throwFetchFailedException(blockId, address, e)
   }
 
-  input = streamWrapper(blockId, in)
-  // Only copy the stream if it's wrapped by compression or 
encryption, also the size of
-  // block is small (the decompressed block is smaller than 
maxBytesInFlight)
-  if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 
3) {
-val originalInput = input
-val out = new ChunkedByteBufferOutputStream(64 * 1024, 
ByteBuffer.allocate)
-try {
+  try {
+input = streamWrapper(blockId, in)
+// Only copy the stream if it's wrapped by compression or 
encryption, also the size of
+// block is small (the decompressed block is smaller than 
maxBytesInFlight)
+if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight 
/ 3) {
+  val out = new ChunkedByteBufferOutputStream(64 * 1024, 
ByteBuffer.allocate)
   // Decompress the whole block at once to detect any 
corruption, which could increase
   // the memory usage tne potential increase the chance of OOM.
   // TODO: manage the memory used here, and spill it into disk 
in case of OOM.
   Utils.copyStream(input, out)
   out.close()
   input = out.toChunkedByteBuffer.toInputStream(dispose = true)
--- End diff --

I just added `originalInput` val and changed its scope to be able to close 
it in finally section


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22325
  
**[Test build #96285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96285/testReport)**
 for PR 22325 at commit 
[`4983d69`](https://github.com/apache/spark/commit/4983d69abf48594e9876d7a40f3f531836333243).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22450
  
@cloud-fan I checked the other operations and they have no issue with 
negative scale. This is the reason why this fix is only for Divide: it is the 
only operation which wasn't dealing it properly.

I also thought about doing that but I chose not to do in order not to 
introduce regressions. Anyway I'll argument more in your PR. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22472: [SPARK-23173][SQL] Reverting of spark.sql.fromJsonForceN...

2018-09-19 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22472
  
Probably revert itself can be done directly in the master branch or it has 
to be done in a PR like this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...

2018-09-19 Thread redsanket
Github user redsanket commented on a diff in the pull request:

https://github.com/apache/spark/pull/22173#discussion_r218942035
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
 ---
@@ -77,17 +82,54 @@
   private static final MessageEncoder ENCODER = MessageEncoder.INSTANCE;
   private static final MessageDecoder DECODER = MessageDecoder.INSTANCE;
 
+  // Separate thread pool for handling ChunkFetchRequest. This helps to 
enable throttling
+  // max number of TransportServer worker threads that are blocked on 
writing response
+  // of ChunkFetchRequest message back to the client via the underlying 
channel.
+  private static EventLoopGroup chunkFetchWorkers;
+
   public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
-this(conf, rpcHandler, false);
+this(conf, rpcHandler, false, false);
   }
 
   public TransportContext(
   TransportConf conf,
   RpcHandler rpcHandler,
   boolean closeIdleConnections) {
+this(conf, rpcHandler, closeIdleConnections, false);
+  }
+
+/**
+ *
+ * @param conf TransportConf
+ * @param rpcHandler RpcHandler responsible for handling requests and 
responses.
+ * @param closeIdleConnections Close idle connections if it is set to 
true.
+ * @param isClientOnly This config is more important when external 
shuffle is enabled.
+ * It stops creating extra event loop and 
subsequent thread pool
+ * for shuffle clients to handle chunked fetch 
requests.
+ * In the case when external shuffle is disabled, 
the executors are both
+ * client and server so both share the same event 
loop which is trivial.
--- End diff --

I hope we follow a similar indentation for all other comments


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22473
  
**[Test build #96284 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96284/testReport)**
 for PR 22473 at commit 
[`3cf88a4`](https://github.com/apache/spark/commit/3cf88a4ab34064074d42f5daa3a448e8f9def649).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22473: [SPARK-25449][CORE] Heartbeat shouldn't include a...

2018-09-19 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/22473#discussion_r218941326
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -799,7 +799,8 @@ private[spark] class Executor(
   if (taskRunner.task != null) {
 taskRunner.task.metrics.mergeShuffleReadMetrics()
 taskRunner.task.metrics.setJvmGCTime(curGCTime - 
taskRunner.startGCTime)
-accumUpdates += ((taskRunner.taskId, 
taskRunner.task.metrics.accumulators()))
+accumUpdates +=
+  ((taskRunner.taskId, 
taskRunner.task.metrics.accumulators().filterNot(_.isZero)))
--- End diff --

Could you add a flag for this behavior change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...

2018-09-19 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22413
  
@dongjoon-hyun Here is the PR for 2.4: 
https://github.com/apache/spark/pull/22474 . I will check does it have 
conflicts in 2.3 and if so, I will backport it to 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22173
  
**[Test build #96283 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96283/testReport)**
 for PR 22173 at commit 
[`574ba81`](https://github.com/apache/spark/commit/574ba81abf3e1d71bac84a83b80e4b67056f7442).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...

2018-09-19 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/22473
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22474: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should ov...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22474
  
**[Test build #96282 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96282/testReport)**
 for PR 22474 at commit 
[`bf3a52c`](https://github.com/apache/spark/commit/bf3a52c1240a14f89949f8b9dcb554090f803250).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22474: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should ov...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22474
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22474: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should ov...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22474
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22474: [SPARK-25425][SQL][BACKPORT-2.4] Extra options sh...

2018-09-19 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/22474

[SPARK-25425][SQL][BACKPORT-2.4] Extra options should override session 
options in DataSource V2

## What changes were proposed in this pull request?

In the PR, I propose overriding session options by extra options in 
DataSource V2. Extra options are more specific and set via `.option()`, and 
should overwrite more generic session options. 

## How was this patch tested?

Added tests for read and write paths.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 session-options-2.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22474


commit ac704568591d2c505b396b5c93138ea8132e3cc4
Author: Maxim Gekk 
Date:   2018-09-16T00:24:11Z

[SPARK-25425][SQL] Extra options should override session options in 
DataSource V2

In the PR, I propose overriding session options by extra options in 
DataSource V2. Extra options are more specific and set via `.option()`, and 
should overwrite more generic session options. Entries from seconds map 
overwrites entries with the same key from the first map, for example:
```Scala
scala> Map("option" -> false) ++ Map("option" -> true)
res0: scala.collection.immutable.Map[String,Boolean] = Map(option -> true)
```

Added a test for checking which option is propagated to a data source in 
`load()`.

Closes #22413 from MaxGekk/session-options.

Lead-authored-by: Maxim Gekk 
Co-authored-by: Dongjoon Hyun 
Co-authored-by: Maxim Gekk 
Signed-off-by: Dongjoon Hyun 

commit bf3a52c1240a14f89949f8b9dcb554090f803250
Author: Maxim Gekk 
Date:   2018-09-19T19:40:25Z

Adding missing import after merge to 2.4




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...

2018-09-19 Thread redsanket
Github user redsanket commented on a diff in the pull request:

https://github.com/apache/spark/pull/22173#discussion_r218939333
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
 ---
@@ -98,21 +98,32 @@ public TransportContext(
 this(conf, rpcHandler, closeIdleConnections, false);
   }
 
+/**
+ *
+ * @param conf TransportConf
+ * @param rpcHandler RpcHandler responsible for handling requests and 
responses.
+ * @param closeIdleConnections Close idle connections if is set to 
true.
+ * @param isClientOnly This config is more important when external 
shuffle is enabled.
--- End diff --

I think for comments we follow the same spacing convention as observed here 
so sticking with it...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22472: [SPARK-23173][SQL] Reverting of spark.sql.fromJsonForceN...

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22472
  
The reverting PR title is usually `Revert "[SPARK-23173][SQL] rename 
spark.sql.fromJsonForceNullableSchema"`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22470
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96271/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22470
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22470
  
**[Test build #96271 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96271/testReport)**
 for PR 22470 at commit 
[`3c05636`](https://github.com/apache/spark/commit/3c05636b879b678cfcf72dfb515ea691317e470d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22472: [SPARK-23173][SQL] Reverting of spark.sql.fromJsonForceN...

2018-09-19 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22472
  
> Let's not mix the reverting and adding a deprecation note 

ok. I will revert the last commit from the PR.

> Also, SPARK-25384 is about Removing 
spark.sql.fromJsonForceNullableSchema, not Deprecating at all.

I removed the ticket from PR's title.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-19 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22448
  
> Do you know the reason for it ?

It is better to ask @HyukjinKwon 

> Seems like we may need to refactor to see if we can avoid duplicating 
findTightestCommonType here ? Can we take this in a follow-up ?

Definitely it can be done in a separate PR. Please, do that if you have 
time (if not I can do it).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r218931652
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,926 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+$(document).ajaxStop($.unblockUI);
+$(document).ajaxStart(function () {
+$.blockUI({message: 'Loading Stage Page...'});
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"file-size-pre": ConvertDurationString,
+
+"file-size-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"file-size-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. 
https://domain:50509/history/application_1502220952225_59143/stages/stage/?id=0=0
+function stageEndPoint(appId) {
+var words = document.baseURI.split('/');
+var words2 = document.baseURI.split('?');
+var ind = words.indexOf("proxy");
+if (ind > 0) {
+var appId = words[ind + 1];
+var stageIdLen = words2[1].indexOf('&');
+var stageId = words2[1].substr(3, stageIdLen - 3);
+var newBaseURI = words.slice(0, ind + 2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+ind = words.indexOf("history");
+if (ind > 0) {
+var appId = words[ind + 1];
+var attemptId = words[ind + 2];
+var stageIdLen = words2[1].indexOf('&');
+var stageId = words2[1].substr(3, stageIdLen - 3);
+var newBaseURI = words.slice(0, ind).join('/');
+if (isNaN(attemptId) || attemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
attemptId + "/stages/" + stageId;
+}
+}
+var stageIdLen = words2[1].indexOf('&');
+var stageId = words2[1].substr(3, stageIdLen - 3);
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializationTime":
+return "Result Serialization Time";
+break;
+
+case "schedulerDelay":
+return "Scheduler Delay";
+break;
+
+case "diskBytesSpilled":
+return "Shuffle spill (disk)";
+break;
+
+case "memoryBytesSpilled":
+return "Shuffle spill (memory)";
+break;
+
+case "shuffleReadMetrics":
+return "Shuffle Read Size / Records";
+break;
+
+case "shuffleWriteMetrics":
+return "Shuffle Write Size / Records";
+break;
+
+case "executorDeserializeTime":
+return "Task Deserialization Time";
+break;
+
+default:
+   

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96260/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r218931328
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,926 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+$(document).ajaxStop($.unblockUI);
+$(document).ajaxStart(function () {
+$.blockUI({message: 'Loading Stage Page...'});
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"file-size-pre": ConvertDurationString,
+
+"file-size-asc": function ( a, b ) {
--- End diff --

file-size name here doesn't match converting duration.  if we can change 
the name to be duration- instead of file-size might make more sense


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22305
  
**[Test build #96260 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96260/testReport)**
 for PR 22305 at commit 
[`bb05ee0`](https://github.com/apache/spark/commit/bb05ee036e63c629fb8fb6225e53cd9340019397).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22325
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96265/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22325
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22325
  
**[Test build #96265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96265/testReport)**
 for PR 22325 at commit 
[`e64374d`](https://github.com/apache/spark/commit/e64374db205f16955dc66b87f05639ce5f555849).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-19 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22448
  
@MaxGekk I was looking at CSVInferSchema. It seems like there is a copy of 
`findTightestCommonType` in this file ? Do you know the reason for it ? Seems 
like we may need to refactor to see if we can avoid duplicating 
`findTightestCommonType` here ? Can we take this in a follow-up ? If you think 
we should only focus on refactoring `findWiderDecimalType` and not worry about 
`findTightestCommonType` then please let me know and i can do it in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22447
  
**[Test build #96280 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96280/testReport)**
 for PR 22447 at commit 
[`c7756ed`](https://github.com/apache/spark/commit/c7756edc46d77ce8fc9b846e91b37a9fcef97600).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...

2018-09-19 Thread redsanket
Github user redsanket commented on a diff in the pull request:

https://github.com/apache/spark/pull/22173#discussion_r218930135
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
 ---
@@ -77,17 +82,43 @@
   private static final MessageEncoder ENCODER = MessageEncoder.INSTANCE;
   private static final MessageDecoder DECODER = MessageDecoder.INSTANCE;
 
+  // Separate thread pool for handling ChunkFetchRequest. This helps to 
enable throttling
+  // max number of TransportServer worker threads that are blocked on 
writing response
+  // of ChunkFetchRequest message back to the client via the underlying 
channel.
+  private static EventLoopGroup chunkFetchWorkers;
+
   public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
-this(conf, rpcHandler, false);
+this(conf, rpcHandler, false, false);
   }
 
   public TransportContext(
   TransportConf conf,
   RpcHandler rpcHandler,
   boolean closeIdleConnections) {
+this(conf, rpcHandler, closeIdleConnections, false);
+  }
+
+  public TransportContext(
+  TransportConf conf,
+  RpcHandler rpcHandler,
+  boolean closeIdleConnections,
+  boolean isClient) {
--- End diff --

sure... anything to make is more clear


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22173
  
**[Test build #96281 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96281/testReport)**
 for PR 22173 at commit 
[`5d4b477`](https://github.com/apache/spark/commit/5d4b477595d3c2df5722ae5cb042a16c2f53fc9b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3254/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22325
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-19 Thread maryannxue
Github user maryannxue commented on the issue:

https://github.com/apache/spark/pull/22447
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22325
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96269/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22325
  
**[Test build #96269 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96269/testReport)**
 for PR 22325 at commit 
[`7ec6de2`](https://github.com/apache/spark/commit/7ec6de29c4813dea56031c398e034d49d474e976).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22447
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96272/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...

2018-09-19 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/22413
  
Thanks @MaxGekk, sorry for the original omission!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22447
  
**[Test build #96272 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96272/testReport)**
 for PR 22447 at commit 
[`c7756ed`](https://github.com/apache/spark/commit/c7756edc46d77ce8fc9b846e91b37a9fcef97600).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22413
  
Thank you so much, @MaxGekk . It would be great if we can have that Spark 
2.4.0 RC2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22472
  
cc @gatorsmile , too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...

2018-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22173#discussion_r218926387
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
 ---
@@ -77,17 +82,43 @@
   private static final MessageEncoder ENCODER = MessageEncoder.INSTANCE;
   private static final MessageDecoder DECODER = MessageDecoder.INSTANCE;
 
+  // Separate thread pool for handling ChunkFetchRequest. This helps to 
enable throttling
+  // max number of TransportServer worker threads that are blocked on 
writing response
+  // of ChunkFetchRequest message back to the client via the underlying 
channel.
+  private static EventLoopGroup chunkFetchWorkers;
+
   public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
-this(conf, rpcHandler, false);
+this(conf, rpcHandler, false, false);
   }
 
   public TransportContext(
   TransportConf conf,
   RpcHandler rpcHandler,
   boolean closeIdleConnections) {
+this(conf, rpcHandler, closeIdleConnections, false);
+  }
+
+  public TransportContext(
+  TransportConf conf,
+  RpcHandler rpcHandler,
+  boolean closeIdleConnections,
+  boolean isClient) {
--- End diff --

we should make this more clear.  Because with external shuffle off, we want 
this on client mode as well since its really both a client and server.  Perhaps 
we could change name to be isClientOnly, we should put some java docs on the 
function to describe the parameter as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...

2018-09-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22472
  
Let's not mix the reverting and adding a deprecation note, @MaxGekk .

Also, `SPARK-25384` is about `Removing 
spark.sql.fromJsonForceNullableSchema`, not `Deprecating` at all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...

2018-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22173#discussion_r218925291
  
--- Diff: 
common/network-common/src/test/java/org/apache/spark/network/ChunkFetchRequestHandlerSuite.java
 ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network;
+
+import io.netty.channel.ChannelHandlerContext;
+import java.util.ArrayList;
+import java.util.List;
+
+import io.netty.channel.Channel;
+import org.apache.spark.network.server.ChunkFetchRequestHandler;
+import org.junit.Test;
+
+import static org.mockito.Mockito.*;
+
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.TransportClient;
+import org.apache.spark.network.protocol.*;
+import org.apache.spark.network.server.NoOpRpcHandler;
+import org.apache.spark.network.server.OneForOneStreamManager;
+import org.apache.spark.network.server.RpcHandler;
+
+public class ChunkFetchRequestHandlerSuite {
+
+  @Test
+  public void handleChunkFetchRequest() throws Exception {
+RpcHandler rpcHandler = new NoOpRpcHandler();
+OneForOneStreamManager streamManager = (OneForOneStreamManager) 
(rpcHandler.getStreamManager());
+Channel channel = mock(Channel.class);
+ChannelHandlerContext context = mock(ChannelHandlerContext.class);
+when(context.channel())
+.thenAnswer(invocationOnMock0 -> {
--- End diff --

change spacing to 2 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22473
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...

2018-09-19 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22470
  
@cloud-fan Could you please check CSVinferSchema::tryParseDecimal() ? There 
is a condition to check negative scale.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...

2018-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22173#discussion_r218924768
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.server;
+
+import java.net.SocketAddress;
+
+import com.google.common.base.Throwables;
+import io.netty.channel.Channel;
+import io.netty.channel.ChannelFuture;
+import io.netty.channel.ChannelFutureListener;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.SimpleChannelInboundHandler;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.TransportClient;
+import org.apache.spark.network.protocol.ChunkFetchFailure;
+import org.apache.spark.network.protocol.ChunkFetchRequest;
+import org.apache.spark.network.protocol.ChunkFetchSuccess;
+import org.apache.spark.network.protocol.Encodable;
+
+import static org.apache.spark.network.util.NettyUtils.*;
+
+
--- End diff --

remove extra empty line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...

2018-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22173#discussion_r218924588
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
 ---
@@ -173,5 +213,14 @@ private TransportChannelHandler 
createChannelHandler(Channel channel, RpcHandler
   conf.connectionTimeoutMs(), closeIdleConnections);
   }
 
+  /**
+   * Creates the dedicated ChannelHandler for ChunkFetchRequest messages.
+   */
+  private ChunkFetchRequestHandler 
createChunkFetchHandler(TransportChannelHandler channelHandler,
+  RpcHandler rpcHandler) {
+return new ChunkFetchRequestHandler(channelHandler.getClient(),
+rpcHandler.getStreamManager(), conf.maxChunksBeingTransferred());
--- End diff --

identation 2 spaces inside return


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22473
  
**[Test build #96279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96279/testReport)**
 for PR 22473 at commit 
[`3cf88a4`](https://github.com/apache/spark/commit/3cf88a4ab34064074d42f5daa3a448e8f9def649).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22473
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...

2018-09-19 Thread mukulmurthy
Github user mukulmurthy commented on the issue:

https://github.com/apache/spark/pull/22473
  
@zsxwing for review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22473: [SPARK-25449][CORE] Heartbeat shouldn't include a...

2018-09-19 Thread mukulmurthy
GitHub user mukulmurthy opened a pull request:

https://github.com/apache/spark/pull/22473

[SPARK-25449][CORE] Heartbeat shouldn't include accumulators for zero 
metrics

## What changes were proposed in this pull request?

Heartbeat shouldn't include accumulators for zero metrics. 

Heartbeats sent from executors to the driver every 10 seconds contain 
metrics and are generally on the order of a few KBs. However, for large jobs 
with lots of tasks, heartbeats can be on the order of tens of MBs, causing 
tasks to die with heartbeat failures. We can mitigate this by not sending zero 
metrics to the driver.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mukulmurthy/oss-spark 25449-heartbeat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22473.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22473


commit 3e0d9536512300d27201e1d5cc4d9b5755a47871
Author: Mukul Murthy 
Date:   2018-09-17T21:55:21Z

Don't send zero accumulators for metrics in heartbeat

commit 3cf88a4ab34064074d42f5daa3a448e8f9def649
Author: Mukul Murthy 
Date:   2018-09-19T18:40:47Z

add tests




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...

2018-09-19 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22413
  
> There is some conflict. Please make backporting PRs to 2.4 and 2.3.
> BTW, @MaxGekk . Could you send a backport PR to branch-2.4?

Sure, I will do that.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22462
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22462
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96266/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96262/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22468
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22462
  
**[Test build #96266 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96266/testReport)**
 for PR 22462 at commit 
[`95727c6`](https://github.com/apache/spark/commit/95727c6518e20d4bb7e18f95adb869edecf0e6a4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22472
  
**[Test build #96278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96278/testReport)**
 for PR 22472 at commit 
[`17275d6`](https://github.com/apache/spark/commit/17275d64653f0ae8d94e8004eed6bedbbf74d3fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...

2018-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22468
  
**[Test build #96262 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96262/testReport)**
 for PR 22468 at commit 
[`bc5f144`](https://github.com/apache/spark/commit/bc5f1445ff6ee45456020d6f0afdd91d14c844af).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class InterpretedSafeProjection(expressions: Seq[Expression]) extends 
Projection `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22472
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...

2018-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22472
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spar...

2018-09-19 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/22472

[SPARK-23173][SPARK-25384][SQL] Reverting of 
spark.sql.fromJsonForceNullableSchema

## What changes were proposed in this pull request?

The flag has been released already in 2.3.x. Renaming it can potentially 
break user applications. I propose to revert it back and leave a note that the 
configuration option will be removed in Spark 3.0.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 revert-forcing-nullable-schema

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22472


commit 675beb4a704491f9168dda4b182b3672781725bd
Author: Maxim Gekk 
Date:   2018-09-19T18:28:10Z

Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"

This reverts commit 6c7db7fd1ced1d143b1389d09990a620fc16be46.

commit 17275d64653f0ae8d94e8004eed6bedbbf74d3fc
Author: Maxim Gekk 
Date:   2018-09-19T18:34:13Z

Making a note




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-19 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22010#discussion_r218918701
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -396,7 +396,26 @@ abstract class RDD[T: ClassTag](
* Return a new RDD containing the distinct elements in this RDD.
*/
   def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): 
RDD[T] = withScope {
-map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)
+partitioner match {
+  case Some(p) if numPartitions == partitions.length =>
+def key(x: T): (T, Null) = (x, null)
+val cleanKey = sc.clean(key _)
+val keyed = new MapPartitionsRDD[(T, Null), T](
+  this,
+  (context, pid, iter) => iter.map(cleanKey),
+  knownPartitioner = Some(new WrappedPartitioner(p)))
+val duplicatesRemoved = keyed.reduceByKey((x, y) => x)
--- End diff --

So I _think_ it is partitioner of input RDD if known partitioner otherwise 
hash partitioner of the default parallelism. Yes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-19 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22010#discussion_r218917483
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala 
---
@@ -35,16 +35,24 @@ import org.apache.spark.{Partition, TaskContext}
  * @param isOrderSensitive whether or not the function is order-sensitive. 
If it's order
  * sensitive, it may return totally different 
result when the input order
  * is changed. Mostly stateful functions are 
order-sensitive.
+ * @param knownPartitioner If the result has a known partitioner.
  */
 private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag](
 var prev: RDD[T],
 f: (TaskContext, Int, Iterator[T]) => Iterator[U],  // (TaskContext, 
partition index, iterator)
 preservesPartitioning: Boolean = false,
 isFromBarrier: Boolean = false,
-isOrderSensitive: Boolean = false)
+isOrderSensitive: Boolean = false,
+knownPartitioner: Option[Partitioner] = None)
   extends RDD[U](prev) {
 
-  override val partitioner = if (preservesPartitioning) 
firstParent[T].partitioner else None
+  override val partitioner = {
+if (preservesPartitioning) {
+  firstParent[T].partitioner
+} else {
+  knownPartitioner
+}
+  }
--- End diff --

I mean yes we can sub-class just as easily -- is that what you mean?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20858: [SPARK-23736][SQL] Extending the concat function ...

2018-09-19 Thread mn-mikke
Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r218917303
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -665,3 +667,219 @@ case class ElementAt(left: Expression, right: 
Expression) extends GetMapValueUti
 
   override def prettyName: String = "element_at"
 }
+
+/**
+ * Concatenates multiple input columns together into a single column.
+ * The function works with strings, binary and compatible array columns.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(col1, col2, ..., colN) - Returns the concatenation of 
col1, col2, ..., colN.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('Spark', 'SQL');
+   SparkSQL
+  > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
+ | [1,2,3,4,5,6]
+  """)
+case class Concat(children: Seq[Expression]) extends Expression {
+
+  private val MAX_ARRAY_LENGTH: Int = 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  val allowedTypes = Seq(StringType, BinaryType, ArrayType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.isEmpty) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  val childTypes = children.map(_.dataType)
+  if (childTypes.exists(tpe => 
!allowedTypes.exists(_.acceptsType(tpe {
+return TypeCheckResult.TypeCheckFailure(
+  s"input to function $prettyName should have been StringType, 
BinaryType or ArrayType," +
+s" but it's " + childTypes.map(_.simpleString).mkString("[", 
", ", "]"))
+  }
+  TypeUtils.checkForSameTypeInputExpr(childTypes, s"function 
$prettyName")
+}
+  }
+
+  override def dataType: DataType = 
children.map(_.dataType).headOption.getOrElse(StringType)
+
+  lazy val javaType: String = CodeGenerator.javaType(dataType)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def eval(input: InternalRow): Any = dataType match {
--- End diff --

Thanks! I've created #22471 to call the pattern matching only once.

WDYT about 
[Reverse](https://github.com/apache/spark/pull/21034/files#diff-9853dcf5ce3d2ac1e94d473197ff5768R240)?
 It looks like a similar problem.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >