[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22450 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3258/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3259/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96288/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22460 **[Test build #96288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96288/testReport)** for PR 22460 at commit [`7252653`](https://github.com/apache/spark/commit/7252653cb836780a12f26b82b5a27a0bc83ee171). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19045 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22450 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3257/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3256/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22465 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19045: [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19045 **[Test build #96290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96290/testReport)** for PR 19045 at commit [`4e6572f`](https://github.com/apache/spark/commit/4e6572f8a7798298fe4787fe5913ee94c2b97359). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22450 **[Test build #96289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96289/testReport)** for PR 22450 at commit [`27a9ea6`](https://github.com/apache/spark/commit/27a9ea656428d4705c6323deda64b14eb7ced7a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22469 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3255/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22460 **[Test build #96288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96288/testReport)** for PR 22460 at commit [`7252653`](https://github.com/apache/spark/commit/7252653cb836780a12f26b82b5a27a0bc83ee171). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22469 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22469 **[Test build #96286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96286/testReport)** for PR 22469 at commit [`d444073`](https://github.com/apache/spark/commit/d444073ec047f157e2ffb55edb0577963bd76f40). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22465 **[Test build #96287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96287/testReport)** for PR 22465 at commit [`3f045c0`](https://github.com/apache/spark/commit/3f045c0512efe968aaf40861b5054af5da254ce3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22450 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r218946996 --- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala --- @@ -35,16 +35,24 @@ import org.apache.spark.{Partition, TaskContext} * @param isOrderSensitive whether or not the function is order-sensitive. If it's order * sensitive, it may return totally different result when the input order * is changed. Mostly stateful functions are order-sensitive. + * @param knownPartitioner If the result has a known partitioner. */ private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag]( var prev: RDD[T], f: (TaskContext, Int, Iterator[T]) => Iterator[U], // (TaskContext, partition index, iterator) preservesPartitioning: Boolean = false, isFromBarrier: Boolean = false, -isOrderSensitive: Boolean = false) +isOrderSensitive: Boolean = false, +knownPartitioner: Option[Partitioner] = None) extends RDD[U](prev) { - override val partitioner = if (preservesPartitioning) firstParent[T].partitioner else None + override val partitioner = { +if (preservesPartitioning) { + firstParent[T].partitioner +} else { + knownPartitioner +} + } --- End diff -- yes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22469 thank you for pointing this out @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r218946895 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,26 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing the distinct elements in this RDD. */ def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { -map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1) +partitioner match { + case Some(p) if numPartitions == partitions.length => +def key(x: T): (T, Null) = (x, null) +val cleanKey = sc.clean(key _) +val keyed = new MapPartitionsRDD[(T, Null), T]( + this, + (context, pid, iter) => iter.map(cleanKey), + knownPartitioner = Some(new WrappedPartitioner(p))) +val duplicatesRemoved = keyed.reduceByKey((x, y) => x) --- End diff -- Yes, would use right partitioner in this case --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22450 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22450 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96254/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22450 **[Test build #96254 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96254/testReport)** for PR 22450 at commit [`27a9ea6`](https://github.com/apache/spark/commit/27a9ea656428d4705c6323deda64b14eb7ced7a4). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22465 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22470 @cloud-fan I think the main problem about this (and it is the reason why I haven't proposed it) is that the range of operations supported would be smaller, so we may forbid operations which now can happen. For instance, the following code has been always working on Spark: `lit(BigDecimal(1e36)) * lit(BigDecimal(1))`. Indeed now this would become a `decimal(6, -36)`. With your change, this is going to be a `decimal(42, 0)` which is out of the range of supported values (ie. an overflow would occur). I am not sure if any user has something like this, but it is possible and I think we cannot exclude it. We may, though, restrict again the condition when it happens, ie. in case we are just parsing from a literal we can avoid returning a negative scale. But the other fix would be needed anyway in this case, as we could still have to deal with negative scales, so IMO this would be quite useless. I'd agree though about forbidding negative scale in 3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96270/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22460 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22460: DO NOT MERGE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22460 **[Test build #96270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96270/testReport)** for PR 22460 at commit [`89dcafe`](https://github.com/apache/spark/commit/89dcafe979a150f8d722fad72a575b67152ccb58). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22325: [SPARK-25318]. Add exception handling when wrappi...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22325#discussion_r218944816 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -444,36 +444,34 @@ final class ShuffleBlockFetcherIterator( throwFetchFailedException(blockId, address, e) } - input = streamWrapper(blockId, in) - // Only copy the stream if it's wrapped by compression or encryption, also the size of - // block is small (the decompressed block is smaller than maxBytesInFlight) - if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { -val originalInput = input -val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) -try { + try { +input = streamWrapper(blockId, in) +// Only copy the stream if it's wrapped by compression or encryption, also the size of +// block is small (the decompressed block is smaller than maxBytesInFlight) +if (detectCorrupt && !input.eq(in) && size < maxBytesInFlight / 3) { + val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate) // Decompress the whole block at once to detect any corruption, which could increase // the memory usage tne potential increase the chance of OOM. // TODO: manage the memory used here, and spill it into disk in case of OOM. Utils.copyStream(input, out) out.close() input = out.toChunkedByteBuffer.toInputStream(dispose = true) --- End diff -- I just added `originalInput` val and changed its scope to be able to close it in finally section --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22325 **[Test build #96285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96285/testReport)** for PR 22325 at commit [`4983d69`](https://github.com/apache/spark/commit/4983d69abf48594e9876d7a40f3f531836333243). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22450 @cloud-fan I checked the other operations and they have no issue with negative scale. This is the reason why this fix is only for Divide: it is the only operation which wasn't dealing it properly. I also thought about doing that but I chose not to do in order not to introduce regressions. Anyway I'll argument more in your PR. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22472: [SPARK-23173][SQL] Reverting of spark.sql.fromJsonForceN...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22472 Probably revert itself can be done directly in the master branch or it has to be done in a PR like this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user redsanket commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218942035 --- Diff: common/network-common/src/main/java/org/apache/spark/network/TransportContext.java --- @@ -77,17 +82,54 @@ private static final MessageEncoder ENCODER = MessageEncoder.INSTANCE; private static final MessageDecoder DECODER = MessageDecoder.INSTANCE; + // Separate thread pool for handling ChunkFetchRequest. This helps to enable throttling + // max number of TransportServer worker threads that are blocked on writing response + // of ChunkFetchRequest message back to the client via the underlying channel. + private static EventLoopGroup chunkFetchWorkers; + public TransportContext(TransportConf conf, RpcHandler rpcHandler) { -this(conf, rpcHandler, false); +this(conf, rpcHandler, false, false); } public TransportContext( TransportConf conf, RpcHandler rpcHandler, boolean closeIdleConnections) { +this(conf, rpcHandler, closeIdleConnections, false); + } + +/** + * + * @param conf TransportConf + * @param rpcHandler RpcHandler responsible for handling requests and responses. + * @param closeIdleConnections Close idle connections if it is set to true. + * @param isClientOnly This config is more important when external shuffle is enabled. + * It stops creating extra event loop and subsequent thread pool + * for shuffle clients to handle chunked fetch requests. + * In the case when external shuffle is disabled, the executors are both + * client and server so both share the same event loop which is trivial. --- End diff -- I hope we follow a similar indentation for all other comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22473 **[Test build #96284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96284/testReport)** for PR 22473 at commit [`3cf88a4`](https://github.com/apache/spark/commit/3cf88a4ab34064074d42f5daa3a448e8f9def649). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22473: [SPARK-25449][CORE] Heartbeat shouldn't include a...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/22473#discussion_r218941326 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -799,7 +799,8 @@ private[spark] class Executor( if (taskRunner.task != null) { taskRunner.task.metrics.mergeShuffleReadMetrics() taskRunner.task.metrics.setJvmGCTime(curGCTime - taskRunner.startGCTime) -accumUpdates += ((taskRunner.taskId, taskRunner.task.metrics.accumulators())) +accumUpdates += + ((taskRunner.taskId, taskRunner.task.metrics.accumulators().filterNot(_.isZero))) --- End diff -- Could you add a flag for this behavior change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22413 @dongjoon-hyun Here is the PR for 2.4: https://github.com/apache/spark/pull/22474 . I will check does it have conflicts in 2.3 and if so, I will backport it to 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #96283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96283/testReport)** for PR 22173 at commit [`574ba81`](https://github.com/apache/spark/commit/574ba81abf3e1d71bac84a83b80e4b67056f7442). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/22473 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22474: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should ov...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22474 **[Test build #96282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96282/testReport)** for PR 22474 at commit [`bf3a52c`](https://github.com/apache/spark/commit/bf3a52c1240a14f89949f8b9dcb554090f803250). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22474: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should ov...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22474 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22474: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should ov...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22474 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22474: [SPARK-25425][SQL][BACKPORT-2.4] Extra options sh...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22474 [SPARK-25425][SQL][BACKPORT-2.4] Extra options should override session options in DataSource V2 ## What changes were proposed in this pull request? In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. ## How was this patch tested? Added tests for read and write paths. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 session-options-2.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22474.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22474 commit ac704568591d2c505b396b5c93138ea8132e3cc4 Author: Maxim Gekk Date: 2018-09-16T00:24:11Z [SPARK-25425][SQL] Extra options should override session options in DataSource V2 In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. Entries from seconds map overwrites entries with the same key from the first map, for example: ```Scala scala> Map("option" -> false) ++ Map("option" -> true) res0: scala.collection.immutable.Map[String,Boolean] = Map(option -> true) ``` Added a test for checking which option is propagated to a data source in `load()`. Closes #22413 from MaxGekk/session-options. Lead-authored-by: Maxim Gekk Co-authored-by: Dongjoon Hyun Co-authored-by: Maxim Gekk Signed-off-by: Dongjoon Hyun commit bf3a52c1240a14f89949f8b9dcb554090f803250 Author: Maxim Gekk Date: 2018-09-19T19:40:25Z Adding missing import after merge to 2.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user redsanket commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218939333 --- Diff: common/network-common/src/main/java/org/apache/spark/network/TransportContext.java --- @@ -98,21 +98,32 @@ public TransportContext( this(conf, rpcHandler, closeIdleConnections, false); } +/** + * + * @param conf TransportConf + * @param rpcHandler RpcHandler responsible for handling requests and responses. + * @param closeIdleConnections Close idle connections if is set to true. + * @param isClientOnly This config is more important when external shuffle is enabled. --- End diff -- I think for comments we follow the same spacing convention as observed here so sticking with it... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22472: [SPARK-23173][SQL] Reverting of spark.sql.fromJsonForceN...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22472 The reverting PR title is usually `Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22470 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96271/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22470 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22470 **[Test build #96271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96271/testReport)** for PR 22470 at commit [`3c05636`](https://github.com/apache/spark/commit/3c05636b879b678cfcf72dfb515ea691317e470d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22472: [SPARK-23173][SQL] Reverting of spark.sql.fromJsonForceN...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22472 > Let's not mix the reverting and adding a deprecation note ok. I will revert the last commit from the PR. > Also, SPARK-25384 is about Removing spark.sql.fromJsonForceNullableSchema, not Deprecating at all. I removed the ticket from PR's title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22448 > Do you know the reason for it ? It is better to ask @HyukjinKwon > Seems like we may need to refactor to see if we can avoid duplicating findTightestCommonType here ? Can we take this in a follow-up ? Definitely it can be done in a separate PR. Please, do that if you have time (if not I can do it). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r218931652 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,926 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +$(document).ajaxStop($.unblockUI); +$(document).ajaxStart(function () { +$.blockUI({message: 'Loading Stage Page...'}); +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"file-size-pre": ConvertDurationString, + +"file-size-asc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? -1 : ((a > b) ? 1 : 0)); +}, + +"file-size-desc": function ( a, b ) { +a = ConvertDurationString( a ); +b = ConvertDurationString( b ); +return ((a < b) ? 1 : ((a > b) ? -1 : 0)); +} +} ); + +// This function will only parse the URL under certain format +// e.g. https://domain:50509/history/application_1502220952225_59143/stages/stage/?id=0=0 +function stageEndPoint(appId) { +var words = document.baseURI.split('/'); +var words2 = document.baseURI.split('?'); +var ind = words.indexOf("proxy"); +if (ind > 0) { +var appId = words[ind + 1]; +var stageIdLen = words2[1].indexOf('&'); +var stageId = words2[1].substr(3, stageIdLen - 3); +var newBaseURI = words.slice(0, ind + 2).join('/'); +return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + stageId; +} +ind = words.indexOf("history"); +if (ind > 0) { +var appId = words[ind + 1]; +var attemptId = words[ind + 2]; +var stageIdLen = words2[1].indexOf('&'); +var stageId = words2[1].substr(3, stageIdLen - 3); +var newBaseURI = words.slice(0, ind).join('/'); +if (isNaN(attemptId) || attemptId == "0") { +return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + stageId; +} else { +return newBaseURI + "/api/v1/applications/" + appId + "/" + attemptId + "/stages/" + stageId; +} +} +var stageIdLen = words2[1].indexOf('&'); +var stageId = words2[1].substr(3, stageIdLen - 3); +return location.origin + "/api/v1/applications/" + appId + "/stages/" + stageId; +} + +function getColumnNameForTaskMetricSummary(columnKey) { +switch(columnKey) { +case "executorRunTime": +return "Duration"; +break; + +case "jvmGcTime": +return "GC Time"; +break; + +case "gettingResultTime": +return "Getting Result Time"; +break; + +case "inputMetrics": +return "Input Size / Records"; +break; + +case "outputMetrics": +return "Output Size / Records"; +break; + +case "peakExecutionMemory": +return "Peak Execution Memory"; +break; + +case "resultSerializationTime": +return "Result Serialization Time"; +break; + +case "schedulerDelay": +return "Scheduler Delay"; +break; + +case "diskBytesSpilled": +return "Shuffle spill (disk)"; +break; + +case "memoryBytesSpilled": +return "Shuffle spill (memory)"; +break; + +case "shuffleReadMetrics": +return "Shuffle Read Size / Records"; +break; + +case "shuffleWriteMetrics": +return "Shuffle Write Size / Records"; +break; + +case "executorDeserializeTime": +return "Task Deserialization Time"; +break; + +default: +
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96260/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21688#discussion_r218931328 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -0,0 +1,926 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +$(document).ajaxStop($.unblockUI); +$(document).ajaxStart(function () { +$.blockUI({message: 'Loading Stage Page...'}); +}); + +$.extend( $.fn.dataTable.ext.type.order, { +"file-size-pre": ConvertDurationString, + +"file-size-asc": function ( a, b ) { --- End diff -- file-size name here doesn't match converting duration. if we can change the name to be duration- instead of file-size might make more sense --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22305 **[Test build #96260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96260/testReport)** for PR 22305 at commit [`bb05ee0`](https://github.com/apache/spark/commit/bb05ee036e63c629fb8fb6225e53cd9340019397). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22325 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96265/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22325 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22325 **[Test build #96265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96265/testReport)** for PR 22325 at commit [`e64374d`](https://github.com/apache/spark/commit/e64374db205f16955dc66b87f05639ce5f555849). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22448 @MaxGekk I was looking at CSVInferSchema. It seems like there is a copy of `findTightestCommonType` in this file ? Do you know the reason for it ? Seems like we may need to refactor to see if we can avoid duplicating `findTightestCommonType` here ? Can we take this in a follow-up ? If you think we should only focus on refactoring `findWiderDecimalType` and not worry about `findTightestCommonType` then please let me know and i can do it in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22447 **[Test build #96280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96280/testReport)** for PR 22447 at commit [`c7756ed`](https://github.com/apache/spark/commit/c7756edc46d77ce8fc9b846e91b37a9fcef97600). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user redsanket commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218930135 --- Diff: common/network-common/src/main/java/org/apache/spark/network/TransportContext.java --- @@ -77,17 +82,43 @@ private static final MessageEncoder ENCODER = MessageEncoder.INSTANCE; private static final MessageDecoder DECODER = MessageDecoder.INSTANCE; + // Separate thread pool for handling ChunkFetchRequest. This helps to enable throttling + // max number of TransportServer worker threads that are blocked on writing response + // of ChunkFetchRequest message back to the client via the underlying channel. + private static EventLoopGroup chunkFetchWorkers; + public TransportContext(TransportConf conf, RpcHandler rpcHandler) { -this(conf, rpcHandler, false); +this(conf, rpcHandler, false, false); } public TransportContext( TransportConf conf, RpcHandler rpcHandler, boolean closeIdleConnections) { +this(conf, rpcHandler, closeIdleConnections, false); + } + + public TransportContext( + TransportConf conf, + RpcHandler rpcHandler, + boolean closeIdleConnections, + boolean isClient) { --- End diff -- sure... anything to make is more clear --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #96281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96281/testReport)** for PR 22173 at commit [`5d4b477`](https://github.com/apache/spark/commit/5d4b477595d3c2df5722ae5cb042a16c2f53fc9b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22447 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22447 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3254/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22325 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/22447 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22325 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96269/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22325: [SPARK-25318]. Add exception handling when wrapping the ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22325 **[Test build #96269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96269/testReport)** for PR 22325 at commit [`7ec6de2`](https://github.com/apache/spark/commit/7ec6de29c4813dea56031c398e034d49d474e976). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22447 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22447 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96272/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22413 Thanks @MaxGekk, sorry for the original omission! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22447: [SPARK-25450][SQL] PushProjectThroughUnion rule uses the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22447 **[Test build #96272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96272/testReport)** for PR 22447 at commit [`c7756ed`](https://github.com/apache/spark/commit/c7756edc46d77ce8fc9b846e91b37a9fcef97600). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22413 Thank you so much, @MaxGekk . It would be great if we can have that Spark 2.4.0 RC2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22472 cc @gatorsmile , too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218926387 --- Diff: common/network-common/src/main/java/org/apache/spark/network/TransportContext.java --- @@ -77,17 +82,43 @@ private static final MessageEncoder ENCODER = MessageEncoder.INSTANCE; private static final MessageDecoder DECODER = MessageDecoder.INSTANCE; + // Separate thread pool for handling ChunkFetchRequest. This helps to enable throttling + // max number of TransportServer worker threads that are blocked on writing response + // of ChunkFetchRequest message back to the client via the underlying channel. + private static EventLoopGroup chunkFetchWorkers; + public TransportContext(TransportConf conf, RpcHandler rpcHandler) { -this(conf, rpcHandler, false); +this(conf, rpcHandler, false, false); } public TransportContext( TransportConf conf, RpcHandler rpcHandler, boolean closeIdleConnections) { +this(conf, rpcHandler, closeIdleConnections, false); + } + + public TransportContext( + TransportConf conf, + RpcHandler rpcHandler, + boolean closeIdleConnections, + boolean isClient) { --- End diff -- we should make this more clear. Because with external shuffle off, we want this on client mode as well since its really both a client and server. Perhaps we could change name to be isClientOnly, we should put some java docs on the function to describe the parameter as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22472 Let's not mix the reverting and adding a deprecation note, @MaxGekk . Also, `SPARK-25384` is about `Removing spark.sql.fromJsonForceNullableSchema`, not `Deprecating` at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218925291 --- Diff: common/network-common/src/test/java/org/apache/spark/network/ChunkFetchRequestHandlerSuite.java --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network; + +import io.netty.channel.ChannelHandlerContext; +import java.util.ArrayList; +import java.util.List; + +import io.netty.channel.Channel; +import org.apache.spark.network.server.ChunkFetchRequestHandler; +import org.junit.Test; + +import static org.mockito.Mockito.*; + +import org.apache.commons.lang3.tuple.ImmutablePair; +import org.apache.commons.lang3.tuple.Pair; +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.TransportClient; +import org.apache.spark.network.protocol.*; +import org.apache.spark.network.server.NoOpRpcHandler; +import org.apache.spark.network.server.OneForOneStreamManager; +import org.apache.spark.network.server.RpcHandler; + +public class ChunkFetchRequestHandlerSuite { + + @Test + public void handleChunkFetchRequest() throws Exception { +RpcHandler rpcHandler = new NoOpRpcHandler(); +OneForOneStreamManager streamManager = (OneForOneStreamManager) (rpcHandler.getStreamManager()); +Channel channel = mock(Channel.class); +ChannelHandlerContext context = mock(ChannelHandlerContext.class); +when(context.channel()) +.thenAnswer(invocationOnMock0 -> { --- End diff -- change spacing to 2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22473 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22470 @cloud-fan Could you please check CSVinferSchema::tryParseDecimal() ? There is a condition to check negative scale. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218924768 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.server; + +import java.net.SocketAddress; + +import com.google.common.base.Throwables; +import io.netty.channel.Channel; +import io.netty.channel.ChannelFuture; +import io.netty.channel.ChannelFutureListener; +import io.netty.channel.ChannelHandlerContext; +import io.netty.channel.SimpleChannelInboundHandler; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.client.TransportClient; +import org.apache.spark.network.protocol.ChunkFetchFailure; +import org.apache.spark.network.protocol.ChunkFetchRequest; +import org.apache.spark.network.protocol.ChunkFetchSuccess; +import org.apache.spark.network.protocol.Encodable; + +import static org.apache.spark.network.util.NettyUtils.*; + + --- End diff -- remove extra empty line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22173: [SPARK-24335] Spark external shuffle server impro...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22173#discussion_r218924588 --- Diff: common/network-common/src/main/java/org/apache/spark/network/TransportContext.java --- @@ -173,5 +213,14 @@ private TransportChannelHandler createChannelHandler(Channel channel, RpcHandler conf.connectionTimeoutMs(), closeIdleConnections); } + /** + * Creates the dedicated ChannelHandler for ChunkFetchRequest messages. + */ + private ChunkFetchRequestHandler createChunkFetchHandler(TransportChannelHandler channelHandler, + RpcHandler rpcHandler) { +return new ChunkFetchRequestHandler(channelHandler.getClient(), +rpcHandler.getStreamManager(), conf.maxChunksBeingTransferred()); --- End diff -- identation 2 spaces inside return --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22473 **[Test build #96279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96279/testReport)** for PR 22473 at commit [`3cf88a4`](https://github.com/apache/spark/commit/3cf88a4ab34064074d42f5daa3a448e8f9def649). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22473 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22473: [SPARK-25449][CORE] Heartbeat shouldn't include accumula...
Github user mukulmurthy commented on the issue: https://github.com/apache/spark/pull/22473 @zsxwing for review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22473: [SPARK-25449][CORE] Heartbeat shouldn't include a...
GitHub user mukulmurthy opened a pull request: https://github.com/apache/spark/pull/22473 [SPARK-25449][CORE] Heartbeat shouldn't include accumulators for zero metrics ## What changes were proposed in this pull request? Heartbeat shouldn't include accumulators for zero metrics. Heartbeats sent from executors to the driver every 10 seconds contain metrics and are generally on the order of a few KBs. However, for large jobs with lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks to die with heartbeat failures. We can mitigate this by not sending zero metrics to the driver. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mukulmurthy/oss-spark 25449-heartbeat Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22473.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22473 commit 3e0d9536512300d27201e1d5cc4d9b5755a47871 Author: Mukul Murthy Date: 2018-09-17T21:55:21Z Don't send zero accumulators for metrics in heartbeat commit 3cf88a4ab34064074d42f5daa3a448e8f9def649 Author: Mukul Murthy Date: 2018-09-19T18:40:47Z add tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22413 > There is some conflict. Please make backporting PRs to 2.4 and 2.3. > BTW, @MaxGekk . Could you send a backport PR to branch-2.4? Sure, I will do that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22462 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96266/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96262/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22462 **[Test build #96266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96266/testReport)** for PR 22462 at commit [`95727c6`](https://github.com/apache/spark/commit/95727c6518e20d4bb7e18f95adb869edecf0e6a4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22472 **[Test build #96278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96278/testReport)** for PR 22472 at commit [`17275d6`](https://github.com/apache/spark/commit/17275d64653f0ae8d94e8004eed6bedbbf74d3fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #96262 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96262/testReport)** for PR 22468 at commit [`bc5f144`](https://github.com/apache/spark/commit/bc5f1445ff6ee45456020d6f0afdd91d14c844af). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class InterpretedSafeProjection(expressions: Seq[Expression]) extends Projection ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22472 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22472 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22472: [SPARK-23173][SPARK-25384][SQL] Reverting of spar...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22472 [SPARK-23173][SPARK-25384][SQL] Reverting of spark.sql.fromJsonForceNullableSchema ## What changes were proposed in this pull request? The flag has been released already in 2.3.x. Renaming it can potentially break user applications. I propose to revert it back and leave a note that the configuration option will be removed in Spark 3.0. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 revert-forcing-nullable-schema Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22472.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22472 commit 675beb4a704491f9168dda4b182b3672781725bd Author: Maxim Gekk Date: 2018-09-19T18:28:10Z Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema" This reverts commit 6c7db7fd1ced1d143b1389d09990a620fc16be46. commit 17275d64653f0ae8d94e8004eed6bedbbf74d3fc Author: Maxim Gekk Date: 2018-09-19T18:34:13Z Making a note --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r218918701 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,26 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing the distinct elements in this RDD. */ def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { -map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1) +partitioner match { + case Some(p) if numPartitions == partitions.length => +def key(x: T): (T, Null) = (x, null) +val cleanKey = sc.clean(key _) +val keyed = new MapPartitionsRDD[(T, Null), T]( + this, + (context, pid, iter) => iter.map(cleanKey), + knownPartitioner = Some(new WrappedPartitioner(p))) +val duplicatesRemoved = keyed.reduceByKey((x, y) => x) --- End diff -- So I _think_ it is partitioner of input RDD if known partitioner otherwise hash partitioner of the default parallelism. Yes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r218917483 --- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala --- @@ -35,16 +35,24 @@ import org.apache.spark.{Partition, TaskContext} * @param isOrderSensitive whether or not the function is order-sensitive. If it's order * sensitive, it may return totally different result when the input order * is changed. Mostly stateful functions are order-sensitive. + * @param knownPartitioner If the result has a known partitioner. */ private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag]( var prev: RDD[T], f: (TaskContext, Int, Iterator[T]) => Iterator[U], // (TaskContext, partition index, iterator) preservesPartitioning: Boolean = false, isFromBarrier: Boolean = false, -isOrderSensitive: Boolean = false) +isOrderSensitive: Boolean = false, +knownPartitioner: Option[Partitioner] = None) extends RDD[U](prev) { - override val partitioner = if (preservesPartitioning) firstParent[T].partitioner else None + override val partitioner = { +if (preservesPartitioning) { + firstParent[T].partitioner +} else { + knownPartitioner +} + } --- End diff -- I mean yes we can sub-class just as easily -- is that what you mean? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Extending the concat function ...
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r218917303 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -665,3 +667,219 @@ case class ElementAt(left: Expression, right: Expression) extends GetMapValueUti override def prettyName: String = "element_at" } + +/** + * Concatenates multiple input columns together into a single column. + * The function works with strings, binary and compatible array columns. + */ +@ExpressionDescription( + usage = "_FUNC_(col1, col2, ..., colN) - Returns the concatenation of col1, col2, ..., colN.", + examples = """ +Examples: + > SELECT _FUNC_('Spark', 'SQL'); + SparkSQL + > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6)); + | [1,2,3,4,5,6] + """) +case class Concat(children: Seq[Expression]) extends Expression { + + private val MAX_ARRAY_LENGTH: Int = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH + + val allowedTypes = Seq(StringType, BinaryType, ArrayType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.isEmpty) { + TypeCheckResult.TypeCheckSuccess +} else { + val childTypes = children.map(_.dataType) + if (childTypes.exists(tpe => !allowedTypes.exists(_.acceptsType(tpe { +return TypeCheckResult.TypeCheckFailure( + s"input to function $prettyName should have been StringType, BinaryType or ArrayType," + +s" but it's " + childTypes.map(_.simpleString).mkString("[", ", ", "]")) + } + TypeUtils.checkForSameTypeInputExpr(childTypes, s"function $prettyName") +} + } + + override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) + + lazy val javaType: String = CodeGenerator.javaType(dataType) + + override def nullable: Boolean = children.exists(_.nullable) + + override def foldable: Boolean = children.forall(_.foldable) + + override def eval(input: InternalRow): Any = dataType match { --- End diff -- Thanks! I've created #22471 to call the pattern matching only once. WDYT about [Reverse](https://github.com/apache/spark/pull/21034/files#diff-9853dcf5ce3d2ac1e94d473197ff5768R240)? It looks like a similar problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org