[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-27 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r213003599 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -130,6 +130,16 @@ private[spark] class Executor( private val

[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-27 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r213003400 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-22 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21923 this is being continued in https://github.com/apache/spark/pull/22192 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-22 Thread squito
Github user squito closed the pull request at: https://github.com/apache/spark/pull/21923 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22195: [CORE] Fix typo in spark.network.crypto.keyFactor...

2018-08-22 Thread squito
GitHub user squito opened a pull request: https://github.com/apache/spark/pull/22195 [CORE] Fix typo in spark.network.crypto.keyFactoryIterations You can merge this pull request into a Git repository by running: $ git pull https://github.com/squito/spark SPARK-25205

[GitHub] spark issue #21451: [SPARK-24296][CORE] Replicate large blocks as a stream.

2018-08-21 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21451 @tgravescs @vanzin any more comments? I think I've addressed everything --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

2018-08-16 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21950 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22101: [SPARK-25114][Core] Fix RecordBinaryComparator when subt...

2018-08-16 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/22101 the added tests are good. This is pretty nit-picky, but looking at the whole test suite, are there any tests that check for anything other than the first byte (or array length)? Seems the longer

[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

2018-08-16 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21950 restest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-16 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21899 lgtm, just a small reword in one of the msgs suggested by maxgekk. @hvanhovell @gatorsmile would you like to review this as well

[GitHub] spark pull request #21899: [SPARK-24912][SQL] Don't obscure source of OOM du...

2018-08-16 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21899#discussion_r210647464 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -118,12 +119,19 @@ case class

[GitHub] spark pull request #21899: [SPARK-24912][SQL] Don't obscure source of OOM du...

2018-08-16 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21899#discussion_r210645167 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -118,12 +119,19 @@ case class

[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-16 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/22109 can you please close this pr @deshanxiao ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-15 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/22109 ah, right, I thought the listenerbus was doing that, but couldn't find it, I was looking in the wrong place. so @deshanxiao , given the discussion above, any chance you can share more info

[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-15 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/22109 this looks reasonable, but now I'm wondering whether this will only affect the driver. Couldn't it also effect the executors? Executors might get created as soon as there is a [`schedulerBackend

[GitHub] spark pull request #21451: [SPARK-24296][CORE] Replicate large blocks as a s...

2018-08-14 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21451#discussion_r210153952 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -406,6 +407,61 @@ private[spark] class BlockManager( putBytes

[GitHub] spark pull request #21593: [SPARK-24578][Core] Cap sub-region's size of retu...

2018-08-14 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21593#discussion_r210094158 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java --- @@ -137,30 +137,15 @@ protected void deallocate

[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-14 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/22071 any objections about putting this in prior branches as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-14 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21977 it fails consistently for me locally too, with your branch, but with this failure: ``` [info] File "/Users/irashid/github/pub/spark/target/tmp/spark-7c0a388c-1413-4215

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-14 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21977 @rdblue looking ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22101: [SPARK-23207][Core][FOLLOWUP] Fix RecordBinaryComparator...

2018-08-14 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/22101 nit-picky detail -- I think this should get its own jira, since the original fix already went into a release. Using the same jira again makes it hard to tell where this was fixed

[GitHub] spark pull request #22079: [SPARK-23207][SPARK-22905][SQL][BACKPORT-2.2] Shu...

2018-08-13 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22079#discussion_r209815627 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #22079: [SPARK-23207][SPARK-22905][SQL][BACKPORT-2.2] Shuffle+Re...

2018-08-13 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/22079 overall I'm in favor of backporting this, and it looks like the only changes to the original were very small, so I'm in favor

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-13 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 I also think @tgravescs solution of using the HashPartitioner is an acceptable one, though as you've noted it doesn't deal w/ skew (which may be a lot of the existing use of `repartition()`). I

[GitHub] spark pull request #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Serv...

2018-08-13 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22071#discussion_r209747074 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala --- @@ -51,6 +51,13 @@ private[mesos] class

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-08-13 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r209652176 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1349,6 +1339,29 @@ class DAGScheduler( s"l

[GitHub] spark pull request #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Serv...

2018-08-13 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22071#discussion_r209633119 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala --- @@ -51,6 +51,13 @@ private[mesos] class

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 yeah, you'd have to sort the entire record. I think originally it didn't seem like that would work, because you don't know that `T` is sortable for `RDD[T]`. But after a sort, you've got bytes, so

[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-10 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21933 Thanks for detailed analysis @vincent-grosbois . I agree with everything, but as you noted you won't hit this particular issue anymore with `ChunkedByteBufferFileRegion`. Is there another use-case

[GitHub] spark pull request #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Serv...

2018-08-10 Thread squito
GitHub user squito opened a pull request: https://github.com/apache/spark/pull/22071 [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs & defaults. ## What changes were proposed in this pull request? (a) disabled rest submission server by default in standalone

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 > What if the user does't provide a distributed file system path? E.g., you can read from Kafka and write them back to Kafka and such workloads don't need a distributed file system in standal

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-09 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 > The above fix proposal requires more code refactoring of DAGScheduler, and it shall consume some memories to store additional informations (assume you have M active/finished stages, and N sta

[GitHub] spark pull request #21451: [SPARK-24296][CORE] Replicate large blocks as a s...

2018-08-09 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21451#discussion_r209000299 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -404,6 +405,47 @@ private[spark] class BlockManager( putBytes

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-08 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21977 > We've found that python requires a lot less memory than it actually uses because it doesn't know when to GC yes, totally agree, sorry I wasn't clear in my initial comment -- overal

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-08 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 > statistically fine considering most Spark jobs are short-running and don't hit FetchFailure quite often (The major advantage of this approach is that you don't pay for any penalty if you don't

[GitHub] spark pull request #21518: [SPARK-24502][SQL] flaky test: UnsafeRowSerialize...

2018-08-08 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21518#discussion_r208613943 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeRowSerializerSuite.scala --- @@ -45,6 +44,14 @@ class ClosableByteArrayInputStream

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-08 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 @tgravescs its not guaranteed to reproduce with that. IIUC, you need to do a repartition in the same stage that also does a shuffle-read, then have a fetch failure, and on recompute that stage

[GitHub] spark pull request #21451: [SPARK-24296][CORE] Replicate large blocks as a s...

2018-08-06 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21451#discussion_r207968882 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -404,6 +405,47 @@ private[spark] class BlockManager( putBytes

[GitHub] spark issue #21976: [SPARK-24909] Spark scheduler can hang when fetch failur...

2018-08-03 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21976 there is a lot of context here I need to page back in, sorry won't get to this for a few days at least. But at least on testing, have you looked at `SchedulerIntegrationSuite`? I was hoping we

[GitHub] spark issue #21979: [SPARK-25009][CORE]Standalone Cluster mode application s...

2018-08-02 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21979 thanks for catching this. lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-02 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21923#discussion_r207341747 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -130,6 +130,12 @@ private[spark] class Executor( private val

[GitHub] spark pull request #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-02 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21923#discussion_r207300158 --- Diff: core/src/main/java/org/apache/spark/AbstractExecutorPlugin.java --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-02 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21923 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-02 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21927#discussion_r207249848 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -340,6 +340,22 @@ class DAGScheduler

[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-02 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21927#discussion_r207249569 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -340,6 +340,22 @@ class DAGScheduler

[GitHub] spark pull request #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21923#discussion_r207077147 --- Diff: core/src/main/java/org/apache/spark/AbstractExecutorPlugin.java --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r207076922 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1349,6 +1339,29 @@ class DAGScheduler( s"l

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r207037118 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -669,6 +686,34 @@ private[spark] class AppStatusListener

[GitHub] spark pull request #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21923#discussion_r207036143 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -130,6 +130,12 @@ private[spark] class Executor( private val

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r207004345 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -98,14 +103,50 @@ class ExecutorSummary private[spark]( val

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r207004013 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -669,6 +686,34 @@ private[spark] class AppStatusListener

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r207004094 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -98,14 +103,50 @@ class ExecutorSummary private[spark]( val

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r20799 --- Diff: core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala --- @@ -251,6 +261,214 @@ class EventLoggingListenerSuite extends

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r207006413 --- Diff: core/src/main/scala/org/apache/spark/metrics/ExecutorMetricType.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r206998585 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -669,6 +686,34 @@ private[spark] class AppStatusListener

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r206995945 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorMetrics.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21927#discussion_r206985923 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -340,6 +340,22 @@ class DAGScheduler

[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-01 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21923 > Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases. With this basic api, you could just do things that

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-07-31 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r206696830 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -35,18 +36,22 @@ import

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-07-31 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r206696107 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -275,9 +287,13 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-07-31 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r206695469 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -156,6 +160,14 @@ private[yarn] class YarnAllocator

[GitHub] spark issue #21927: [SPARK-24820][Core] Fail fast when submitted job contain...

2018-07-31 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21927 > Second thought: PartitionPruningRDD is just an implementation of RDD. Every user / developer can implement a similar one. Also this doesn't handle the case mentioned by @felixcheung : a.unio

[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-07-31 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21915 thanks @jiangxb1987, lgtm aside from @mengxr 's comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-31 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r206682882 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1349,6 +1339,29 @@ class DAGScheduler( s"l

[GitHub] spark pull request #21899: [SPARK-24912][SQL] Don't obscure source of OOM du...

2018-07-30 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21899#discussion_r206327279 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -118,12 +119,19 @@ case class

[GitHub] spark pull request #21923: [SPARK-24918][Core] Executor Plugin api

2018-07-30 Thread squito
GitHub user squito opened a pull request: https://github.com/apache/spark/pull/21923 [SPARK-24918][Core] Executor Plugin api This provides a very simple api for users to specify arbitrary code to run within an executor, eg. for debugging or added instrumentation. The intial

[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...

2018-07-30 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21908 late lgtm from me too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-07-27 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21899 this looks good -- wondering about a couple of potential improvements. Is it possible to include the actual size of the in-memory table so far in the msg as well? Also, does catching

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-27 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205876806 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...

2018-07-26 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21811 merged to master, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-26 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205652334 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-26 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205652317 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark issue #21885: [SPARK-24926] Ensure numCores is used consistently in al...

2018-07-26 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21885 OK to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-26 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21653 I kicked off the test manually at https://spark-prs.appspot.com/users/hthuynh2. I dunno why the test triggering via comments stops workign on some prs

[GitHub] spark issue #21474: [SPARK-24297][CORE] Fetch-to-disk by default for > 2gb

2018-07-26 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21474 https://issues.apache.org/jira/browse/SPARK-24936 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21474: [SPARK-24297][CORE] Fetch-to-disk by default for > 2gb

2018-07-26 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21474 oh no, good point Tom. it'll fail with ``` 18/07/26 07:15:02 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.3 (TID 15, irashid-large-2.gce.cloudera.com, executor 2

[GitHub] spark pull request #21880: [SPARK-24929][INFRA] Make merge script don't swal...

2018-07-26 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21880#discussion_r205470335 --- Diff: dev/merge_spark_pr.py --- @@ -331,6 +331,9 @@ def choose_jira_assignee(issue, asf_jira): assignee = asf_jira.user

[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-25 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21451 fyi, I did finally run my scale tests again on a cluster, and shuffles, remote reads, and replication worked for blocks over 2gb (sorry got sidetracked with a few other things in the meantime

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-07-25 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 sorry I got bogged down in some other things, thanks for the responses: >> on a fetch-failure in repartition, fail the entire job > Currently I can't figure o

[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...

2018-07-25 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r205150784 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +733,14 @@ private[spark] class BlockManager

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205126592 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...

2018-07-25 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r205124896 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-24 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204917880 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-24 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204914384 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDDBarrier.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-24 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204917245 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-24 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204912925 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -359,20 +368,56 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...

2018-07-24 Thread squito
GitHub user squito opened a pull request: https://github.com/apache/spark/pull/21867 [SPARK-24307][CORE] Add conf to revert to old code. In case there are any issues in converting FileSegmentManagedBuffer to ChunkedByteBuffer, add a conf to go back to old code path

[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...

2018-07-24 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21811 lgtm will wait a bit for any more comments before merging --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21474: [SPARK-24297][CORE] Fetch-to-disk by default for > 2gb

2018-07-24 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21474 good point @jerryshao , I've updated the docs now, please take a look, thanks --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...

2018-07-24 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21440 @gatorsmile sure, thats pretty easy. I'll submit a follow up pr. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...

2018-07-23 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21811 > Does it make sense to release byteChannel at deallocate()? you could, just to let GC kick in a *bit* earlier, but I don't think its going to make a big difference. (Netty's ByteBufs m

[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...

2018-07-23 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21811 > I like it, but this will still create the byte channel right? is there a way to reuse it? we could create a pool, though management becomes a bit more complex. would you ever shr

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-07-20 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20761#discussion_r204171230 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceTypeHelper.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed

[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...

2018-07-19 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21811 @zsxwing @jerryshao @Victsm you might be interested in this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...

2018-07-19 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21811 can you add something in the PR description about how this is important because sometimes many of these messages queue up in netty's ChannelOutboundBuffer before transferTo() is called? its

[GitHub] spark issue #21811: [SPARK-24801][CORE] Avoid memory waste by empty byte[] a...

2018-07-19 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21811 Ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...

2018-07-19 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r203773818 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -659,6 +659,11 @@ private[spark] class BlockManager( * Get block

[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21451 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21451 @mridulm @jerryshao @felixcheung last one in the 2GB block limit series. just rebased to include the updates to https://github.com/apache/spark/pull/21440. I will also run my tests on a cluster

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-07-18 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r203520320 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -160,11 +160,29 @@ case class SparkListenerBlockUpdated

<    1   2   3   4   5   6   7   8   9   10   >