[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 Thanks a lot @srowen . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23160: [SPARK-26196][SPARK-26281][WebUI] Total tasks title in t...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23160 Thanks @srowen. This issue happens only in master branch. Thank you all for the review and comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 Thanks @srowen . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239618167 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- Updated the code --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239614561 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- Thanks @vanzin for the review. It seems we need both the methods in the ZstdCompressionCodec class, like in the previous change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239534469 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -118,10 +118,12 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { case e: HaltReplayException => // Just stop replay. case _: EOFException if maybeTruncated => - case _: IOException if maybeTruncated => -logWarning(s"Failed to read Spark event log: $sourceName") case ioe: IOException => -throw ioe +if (maybeTruncated) { --- End diff -- Thanks. updated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239530668 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -118,10 +118,12 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { case e: HaltReplayException => // Just stop replay. case _: EOFException if maybeTruncated => - case _: IOException if maybeTruncated => -logWarning(s"Failed to read Spark event log: $sourceName") case ioe: IOException => -throw ioe +if (maybeTruncated) { --- End diff -- Yes. I think, I simplified it in one block. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239521593 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- I have updated the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239516496 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- Thanks @srowen . > Is it actually desirable to not fail on a partial frame? I'm not sure. We shouldn't encounter it elsewhere. Yes. Ideally it shouldn't fail. Even for EventLoggingListener if the application is finished, the frame will close (That is why it is applicable for only running application). After analyzing again the zstd code, the impact seems lesser "Either throw exception or read the frame", and latter seems better. I can update the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23160: [SPARK-26196][SPARK-26281][WebUI] Total tasks title in t...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23160 cc @tgravescs @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239496266 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -118,8 +118,6 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { case e: HaltReplayException => // Just stop replay. case _: EOFException if maybeTruncated => - case _: IOException if maybeTruncated => --- End diff -- This was added for zstd incomplete frame reading issue. But after this change, that issue is no longer happens. Yes. we can keep as it is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239494478 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- Hi @srowen Thanks for the comment. Yes. This parameter is, by default false, intended for continuous stream reading. So, if some classes doesn't need continuously read the data, do we need to set `isContinuous` as true. This method is called by other classes, like 'UnsafeShuffleWriter' etc. which are performance sensitive. If it try to read from the open frames, this issue (read error exception) will happen in other classes as well. But, other than from 'EventLoggingListener', this issue hasn't reported. That is why, I tried to limit it to the EventLoggingListener call. Yes. If we do 'continuous' true for all, then this code will be much simplified. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading open frames of zstd, ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 Thanks @vanzin I updated the title --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239232597 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- @srowen I updated the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23240 Thanks. I will update the PR title --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239219205 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- Thanks. I will update the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239216282 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- Thanks @srowen for the review. Since, the CompressionCodec class used by many classes, we need to see any use case for, whether to read open frame for zstd case. As far as the eventLoggingListener class is concerned, it needs the open frame data also. So, I tried to change as minimal as possible without impacting the other calls. > I think that if we introduce a new method we might try to make it a little more general, like: compressedInputStreamForPartialFile or something. It would be good to avoid the isInstanceOf below. Yeah. This is a cleaner solution. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 cc @vanzin @srowen Kindly review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/23241 [SPARK-26283][CORE]When zstd compression enabled, Inprogress application in the history server appUI showing finished job as running. ## What changes were proposed in this pull request? Root cause: Prior to Spark2.4, When we enable zst for eventLog compression, It always throws exception in the Application UI, when we open from the history server. But after 2.4 it will display the UI information based on the completed frames in the zstd compressed eventLog. But doesn't read open frames for inprogress application. In this PR, we have added 'setContinous(true)' for reading input stream from eventLog, so that it can read from open frames also. (By default 'isContinous=false' for zstd inputStream and we we try to read an open frame, it throws truncated error) ## How was this patch tested? Test steps: 1) Add the configurations in the spark-defaults.conf (i) spark.eventLog.compress true (ii) spark.io.compression.codec zstd 2) Restart history server 3) bin/spark-shell 4) sc.parallelize(1 to 1000, 1000).count 5) Open app UI from the history server UI **Before fix** ![screenshot from 2018-12-06 00-01-38](https://user-images.githubusercontent.com/23054875/49537340-bfe28b00-f8ee-11e8-9fca-6d42fdc89e1a.png) **After fix:** ![screenshot from 2018-12-06 00-34-39](https://user-images.githubusercontent.com/23054875/49537353-ca9d2000-f8ee-11e8-803d-645897b9153b.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark zstdEventLog Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23241.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23241 commit 7d06f353dedc641956a6fd6ead5174d6733c1360 Author: Shahid Date: 2018-12-05T17:37:13Z zstd eventLog --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23240 Hi @gengliangwang , It seems, this was already handled in the PR, https://github.com/apache/spark/pull/23160 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Thanks @vanzin @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Hi @vanzin , I have opened a JIRA for disk store case https://issues.apache.org/jira/browse/SPARK-26260. I will try to work on the same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r238492653 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala --- @@ -77,6 +77,34 @@ class AppStatusStoreSuite extends SparkFunSuite { assert(store.count(classOf[CachedQuantile]) === 2) } + test("only successfull task have taskSummary") { +val store = new InMemoryStore() +(0 until 5).foreach { i => store.write(newTaskData(i, "FAILED")) } --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r238492582 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -148,11 +148,20 @@ private[spark] class AppStatusStore( // cheaper for disk stores (avoids deserialization). val count = { Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(TaskIndexNames.EXEC_RUN_TIME) - .first(0L) - .closeableIterator() +if (store.isInstanceOf[LevelDB]) { --- End diff -- Done. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23191: [SPARK-26219][CORE][branch-2.4] Executor summary ...
Github user shahidki31 closed the pull request at: https://github.com/apache/spark/pull/23191 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23191 Thanks a lot @vanzin @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23205 Thanks @pgandhi999 :+1: --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Thanks @pgandhi999 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r238135277 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -148,11 +148,20 @@ private[spark] class AppStatusStore( // cheaper for disk stores (avoids deserialization). val count = { Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(TaskIndexNames.EXEC_RUN_TIME) - .first(0L) - .closeableIterator() +if (store.isInstanceOf[LevelDB]) { --- End diff -- Yes. Now, for diskStore case, it finds total tasks count and inMemory case only successful tasks count. This 'count' is used to find quantileIndices for all the tasks metrics. https://github.com/apache/spark/blob/676bbb2446af1f281b8f76a5428b7ba75b7588b3/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala#L222 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Hi @pgandhi999 , It seems after your checkins, when there is no summary metrics, it is displaying empty table rather than a message which shown in the PR title. could you please help me to fix that. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Hi @srowen . Yes. Currently for disk store case, we need to have a more optimized code. > While it makes some sense I have two concerns: different answers based on disk vs memory store which shouldn't really affect things. But would a user ever have both and see both side by side and be confused? This configurable store is only for history server. So user can configure either one at a time for history server. But, the live UI (which open from Yarn UI), also goes through the same code flow. Where it has 'ElementTrackingStore', which is also inMemory. So, if a user configure disk store for History server and open both live and inProgress History UI, the summary metrics will be different. > changing the way the indexing works, so that you can index by specific metrics for successful and failed tasks differently, would be tricky, and also would require changing the disk store version (to invalidate old stores). I think @vanzin suggestion seems work, but need time to give it a try and to test it. May be we can add as "TODO" for diskStore case or open a seperate JIRA for that. > Second is, that seems like it should still entail pushing down all the quantile logic into the KVStore, to be clean, right? and that's a bigger change. Thanks @srowen for the suggestion. Probably @vanzin can answer this well. I have modified the code, for InMemory case. Disk store still uses the old code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23160: [SPARK-26196][WebUI] Total tasks title in the stage page...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23160 > So as far as I can think, I expect that if you sort the Duration column, it will perform sorting on row.duration instead of row.taskMetrics.executorRunTime, thus, not getting the desired results. @pgandhi999 Actually we have mapped the "Duration" to "executorRunTime" for sorting in the PR which I have mentioned above. So, after this PR the "Duration" is sorting correctly --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23160: [SPARK-26196][WebUI] Total tasks title in the sta...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23160#discussion_r238060278 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -661,8 +662,8 @@ $(document).ready(function () { {data : "launchTime", name: "Launch Time", render: formatDate}, { data : function (row, type) { -if (row.duration) { -return type === 'display' ? formatDuration(row.duration) : row.duration; +if (row.taskMetrics && row.taskMetrics.executorRunTime) { +return type === 'display' ? formatDuration(row.taskMetrics.executorRunTime) : row.taskMetrics.executorRunTime; --- End diff -- Thanks removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23191 cc @vanzin . Kindly review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23160: [SPARK-26196][WebUI] Total tasks title in the stage page...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23160 Hi @pgandhi999 , There is a small nit there. Recently we fixed a bug related to the "duration" metrics in the tasks table (see https://github.com/apache/spark/pull/23081), but that hasn't reflected here. I have updated the code --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23191 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23191 Jenkins, test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23191: [SPARK-26219][CORE][branch-2.4] Executor summary ...
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/23191 [SPARK-26219][CORE][branch-2.4] Executor summary should get updated for failure jobs in the history server UI ## What changes were proposed in this pull request? Back port the commit https://github.com/apache/spark/pull/23181 into Spark2.4 branch ## How was this patch tested? Added UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark branch-2.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23191.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23191 commit 590d580bd7e6a1a9b1f7d5645d60671c0d93decc Author: Shahid Date: 2018-12-01T04:05:39Z [SPARK-26219][CORE][branch-2.4] Executor summary should get updated for failure jobs in the history server UI --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23181 Thanks @vanzin. I will open a PR in 2.4 branch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23181 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23181: [SPARK-26219][CORE] Executor summary should get u...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23181#discussion_r237959492 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala --- @@ -1274,47 +1274,69 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter { } test("SPARK-25451: total tasks in the executor summary should match total stage tasks") { -val testConf = conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue) -val listener = new AppStatusListener(store, testConf, true) +val isLiveSeq = Seq(true, false) -val stage = new StageInfo(1, 0, "stage", 4, Nil, Nil, "details") -listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage), null)) -listener.onStageSubmitted(SparkListenerStageSubmitted(stage, new Properties())) +isLiveSeq.foreach { live: Boolean => + val testConf = if (live) { +conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23181: [SPARK-26219][CORE] Executor summary should get u...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23181#discussion_r237959457 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala --- @@ -1274,47 +1274,69 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter { } test("SPARK-25451: total tasks in the executor summary should match total stage tasks") { -val testConf = conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue) -val listener = new AppStatusListener(store, testConf, true) +val isLiveSeq = Seq(true, false) -val stage = new StageInfo(1, 0, "stage", 4, Nil, Nil, "details") -listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage), null)) -listener.onStageSubmitted(SparkListenerStageSubmitted(stage, new Properties())) +isLiveSeq.foreach { live: Boolean => --- End diff -- Thanks. Updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23160: [SPARK-26196][WebUI] Total tasks title in the sta...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23160#discussion_r237907702 --- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js --- @@ -610,7 +610,8 @@ $(document).ready(function () { $("#accumulator-table").DataTable(accumulatorConf); // building tasks table that uses server side functionality -var totalTasksToShow = responseBody.numCompleteTasks + responseBody.numActiveTasks; +var totalTasksToShow = responseBody.numCompleteTasks + responseBody.numActiveTasks + --- End diff -- Yes @pgandhi999 . All the failed tasks are displaying in the tasks table. Thanks for your great work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23160: [SPARK-26196][WebUI] Total tasks title in the stage page...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23160 @tgravescs Yes. Task table is showing all the tasks. Sorry, I didn't put the whole screen shot. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23181 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23181 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23168: [SPARK-26207][doc]add PowerIterationClustering (P...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23168#discussion_r237637501 --- Diff: docs/ml-clustering.md --- @@ -265,3 +265,38 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details. + +## Power Iteration Clustering (PIC) + +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm +developed by http://www.icml2010.org/papers/387.pdf>Lin and Cohen. --- End diff -- Hi @huaxingao , It seems the link is not accessible now. This link (http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf) seems working. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26100][CORE] Executor summary should get updated ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23181 cc @vanzin Kindly review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26100][CORE] Executor summary should get updated ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23181 Before patch: ![screenshot from 2018-11-29 22-13-34](https://user-images.githubusercontent.com/23054875/49246338-a21ead00-f43a-11e8-8214-f1020420be52.png) After patch: ![screenshot from 2018-11-30 00-54-49](https://user-images.githubusercontent.com/23054875/49246353-aa76e800-f43a-11e8-98ef-7faecaa7a50e.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23181: [SPARK-26100][CORE] Executor summary should get updated ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23181 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23181: Executor summary should update for history events
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/23181 Executor summary should update for history events ## What changes were proposed in this pull request? The root cause of the problem is, whenever the taskEnd event comes after stageCompleted event, execSummary is updating only for live UI. we need to update for history UI too. To see the previous discussion, refer: PR for https://github.com/apache/spark/pull/23038, https://issues.apache.org/jira/browse/SPARK-26100. ## How was this patch tested? Added UT. Manually verified You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark executorUpdate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23181.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23181 commit ae71eba254235c6317c91064e9a7b2e55c1c1cce Author: Shahid Date: 2018-11-29T18:09:45Z Executor summary should update for history events --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23180: Executor summary should update for history events
Github user shahidki31 closed the pull request at: https://github.com/apache/spark/pull/23180 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23180: Executor summary should update for history events
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/23180 Executor summary should update for history events ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark executorUpdate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23180.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23180 commit ae71eba254235c6317c91064e9a7b2e55c1c1cce Author: Shahid Date: 2018-11-29T18:09:45Z Executor summary should update for history events --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23158 Thanks a lot @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23158 It is random failure. Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23158 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23158 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23158#discussion_r237295187 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc assert(!log2.exists()) } + test("should not clean inprogress application with lastUpdated time less the maxTime") { +val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) +val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) +val maxAge = TimeUnit.DAYS.toMillis(7) +val clock = new ManualClock(0) +val provider = new FsHistoryProvider( + createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock) +val log = newLogFile("inProgressApp1", None, inProgress = true) +writeFile(log, true, None, + SparkListenerApplicationStart( +"inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1")) +) +clock.setTime(firstFileModifiedTime) +provider.checkForLogs() --- End diff -- I see, Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23158#discussion_r237292237 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc assert(!log2.exists()) } + test("should not clean inprogress application with lastUpdated time less the maxTime") { +val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) +val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) +val maxAge = TimeUnit.DAYS.toMillis(7) +val clock = new ManualClock(0) +val provider = new FsHistoryProvider( + createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock) +val log = newLogFile("inProgressApp1", None, inProgress = true) +writeFile(log, true, None, + SparkListenerApplicationStart( +"inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1")) +) +clock.setTime(firstFileModifiedTime) +provider.checkForLogs() --- End diff -- added Thanks. But for inProgress application, do we really need to set log file's last modified time, as the cleaner check only the application's lastUpdated time, which we update whenever size of the logFile changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23158#discussion_r237291446 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite with BeforeAndAfter with Matc assert(!log2.exists()) } + test("should not clean inprogress application with lastUpdated time less the maxTime") { +val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1) +val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6) +val maxAge = TimeUnit.DAYS.toMillis(7) +val clock = new ManualClock(0) +val provider = new FsHistoryProvider( + createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock) --- End diff -- Done. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r237154542 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) --- End diff -- @srowen Yes. everything is loaded in "sorted" order based on index, and then we do filtering. For In memory case, this doesn't cause any issue. but for diskStore extra de serialization overhead is there. May be one possible solution can be, for diskStore case, bring only first time and sort based on the corresponding indices to compute the quantiles. If the solution seems complicated, then we can tell the user that, summary metrics display the quantile summary of all the tasks, instead of completed. correct me if I am wrong --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23160: [SPARK-26196]Total tasks title in the stage page ...
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/23160 [SPARK-26196]Total tasks title in the stage page is incorrect when there are failed or killed tasks ## What changes were proposed in this pull request? Total tasks = numCompleteTasks + numActiveTasks + numKilledTasks + numFailedTasks; ## How was this patch tested? test step: ``` bin/spark-shell scala > sc.parallelize(1 to 100, 10).count ``` ![screenshot from 2018-11-28 07-26-00](https://user-images.githubusercontent.com/23054875/49123523-e2691880-f2de-11e8-9c16-60d1865e6e77.png) After patch: ![screenshot from 2018-11-28 07-24-31](https://user-images.githubusercontent.com/23054875/49123525-e432dc00-f2de-11e8-89ca-4a53e19c9c18.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark totalTasks Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23160.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23160 commit 37bcd6231816c7bc2b2561bff10955b822934ac6 Author: Shahid Date: 2018-11-28T01:41:50Z Total tasks title in the stage page is incorrect --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184] Last updated time is not gett...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23158 Expired was determined by the `lastUpdateTime` which we need to update when ever an eventLog update happens https://github.com/apache/spark/blob/2d89d109e19d1e84c4ada3c9d5d48cfcf3d997ea/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L788-L793 https://github.com/apache/spark/blob/2d89d109e19d1e84c4ada3c9d5d48cfcf3d997ea/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L1129-L1130 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184] Last updated time is not gett...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23158 cc @vanzin @srowen Kindly review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/23158 [SPARK-26186][SPARK-26184] Last updated time is not getting updated for the Inprogress application ## What changes were proposed in this pull request? When the 'spark.history.fs.inProgressOptimization.enabled' is true, inProgress application's last updated time is not getting updated in the History UI. Also, during the cleaning time, InProgress application is getting removed from the listing, even if the last updated time is within the cleaning threshold time. ## How was this patch tested? Added UT, attached screen shot. Before patch: ![screenshot from 2018-11-27 23-22-38](https://user-images.githubusercontent.com/23054875/49101600-9b5a3380-f29c-11e8-8efc-3fb594e4279a.png) ![screenshot from 2018-11-27 23-20-11](https://user-images.githubusercontent.com/23054875/49101601-9c8b6080-f29c-11e8-928e-643a8c8f4477.png) After Patch: ![screenshot from 2018-11-27 23-37-10](https://user-images.githubusercontent.com/23054875/49101911-669aac00-f29d-11e8-8181-663e4a08ab0e.png) ![screenshot from 2018-11-27 23-39-04](https://user-images.githubusercontent.com/23054875/49102010-a5306680-f29d-11e8-947a-e8a2a09a785a.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark HistoryLastUpdateTime Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23158 commit 985470c0e12a3c67df47b5174748652c6e6f6e57 Author: Shahid Date: 2018-11-27T13:32:58Z update lastUpdateTime for inprogress application --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236455725 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) --- End diff -- @vanzin Yes. It seems, loading the stage page take lot more time if we enable disk store. InMemory store seems okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23038 Oh. I could have re based and tested in local. Thanks @vanzin for the fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23038 Thank you @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236409746 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala --- @@ -95,10 +123,18 @@ class AppStatusStoreSuite extends SparkFunSuite { private def newTaskData(i: Int): TaskDataWrapper = { new TaskDataWrapper( - i, i, i, i, i, i, i.toString, i.toString, i.toString, i.toString, false, Nil, None, + i, i, i, i, i, i, i.toString, i.toString, "SUCCESS", i.toString, false, Nil, None, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, stageId, attemptId) } + private def failedTaskData(i: Int): TaskDataWrapper = { --- End diff -- Thanks. done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236409661 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) +.parent(stageKey) +.index(index) +.first(0L) +.asScala +.filter(_.status == "SUCCESS") // Filter "SUCCESS" tasks +.zipWithIndex +.filter(x => indices.contains(x._2)) --- End diff -- Thanks. I have modified --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236409557 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) --- End diff -- Thank you @vanzin for the review. I will check the time with large number of tasks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236234523 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -150,8 +150,9 @@ private[spark] class AppStatusStore( Utils.tryWithResource( store.view(classOf[TaskDataWrapper]) .parent(stageKey) - .index(TaskIndexNames.EXEC_RUN_TIME) - .first(0L) + .index(TaskIndexNames.STATUS) + .first("SUCCESS") + .last("SUCCESS") --- End diff -- Here the index in "status". So, it will read in a sorted order based on the "status". ie. "SUCCESS" task will be in one group, "FAILED" task be after that and so on. So, if we do first("success") to last("success"), it will count only successful tasks, not all task between the first and last successful one. Also, In the UT I have added, even indices has "success" tasks and odd indices has "failed tasks". But the count is still 3. ie. ("0", "2", "4" ) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236234400 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) --- End diff -- Thank you @srowen for the suggestion. I will try adding the filter method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236120634 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) --- End diff -- Yes. If we do, "if (status == "SUCCESS")" for every iterator value, we can't do the skip function. Becuase, earlier we know the exact index we need to take. ie. we can directly skip to 25th percentile, 50th percentile and so on. Now, we don't know which index has the 25th percentile of the "SUCCESS" value, unless we iterate each. Otherwise, we have to filter the "SUCCESS" the tasks prior, like I have done in the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236087784 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) +.parent(stageKey) +.index(index) +.first(0L) +.asScala +.filter(_.status.startsWith("S")) // Filter "SUCCESS" tasks +.zipWithIndex +.filter(x => indices.contains(x._2)) + + if(quantileTasks.size > indices.length) { --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236087749 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala --- @@ -77,6 +77,30 @@ class AppStatusStoreSuite extends SparkFunSuite { assert(store.count(classOf[CachedQuantile]) === 2) } + test("only successfull task have taskSummary") { +val store = new InMemoryStore() +(0 until 5).foreach { i => store.write(failedTaskData(i)) } +val appStore = new AppStatusStore(store).taskSummary(stageId, attemptId, uiQuantiles) +assert(appStore.size === 0) + } + + test("summary should contain task metrics of only successfull tasks") { +val store = new InMemoryStore() +(0 until 5).foreach { i => + if (i % 2 == 1) store.write(failedTaskData(i)) + else store.write(newTaskData(i)) +} +val summary = new AppStatusStore(store).taskSummary(stageId, attemptId, uiQuantiles).get + +val values = (0 to 2).map( i => --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236087731 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) --- End diff -- Yes. That was the original intend. unfortunately after converting to the scala collection, the skip() functionality is not there. Also the kvstore doesn't have any filter API to filter the "success" tasks. The PR was for reducing the computational time for loading the stagePage from the diskStore ( for history server), by avoiding in memory sorting. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236087746 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala --- @@ -77,6 +77,30 @@ class AppStatusStoreSuite extends SparkFunSuite { assert(store.count(classOf[CachedQuantile]) === 2) } + test("only successfull task have taskSummary") { +val store = new InMemoryStore() +(0 until 5).foreach { i => store.write(failedTaskData(i)) } +val appStore = new AppStatusStore(store).taskSummary(stageId, attemptId, uiQuantiles) +assert(appStore.size === 0) + } + + test("summary should contain task metrics of only successfull tasks") { +val store = new InMemoryStore() +(0 until 5).foreach { i => + if (i % 2 == 1) store.write(failedTaskData(i)) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236087741 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) +.parent(stageKey) +.index(index) +.first(0L) +.asScala +.filter(_.status.startsWith("S")) // Filter "SUCCESS" tasks +.zipWithIndex +.filter(x => indices.contains(x._2)) + + if(quantileTasks.size > indices.length) { + quantileTasks.map(task => fn(task._1).toDouble).toIndexedSeq + } else { +indices.map( index => + fn(quantileTasks.filter(_._2 == index).head._1).toDouble).toIndexedSeq --- End diff -- Modified. Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23088#discussion_r236087736 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala --- @@ -222,29 +223,20 @@ private[spark] class AppStatusStore( val indices = quantiles.map { q => math.min((q * count).toLong, count - 1) } def scanTasks(index: String)(fn: TaskDataWrapper => Long): IndexedSeq[Double] = { - Utils.tryWithResource( -store.view(classOf[TaskDataWrapper]) - .parent(stageKey) - .index(index) - .first(0L) - .closeableIterator() - ) { it => -var last = Double.NaN -var currentIdx = -1L -indices.map { idx => - if (idx == currentIdx) { -last - } else { -val diff = idx - currentIdx -currentIdx = idx -if (it.skip(diff - 1)) { - last = fn(it.next()).toDouble - last -} else { - Double.NaN -} - } -}.toIndexedSeq + val quantileTasks = store.view(classOf[TaskDataWrapper]) +.parent(stageKey) +.index(index) +.first(0L) +.asScala +.filter(_.status.startsWith("S")) // Filter "SUCCESS" tasks --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Thank you @srowen for the review. I will update the PR. > When did the behavior change? you said you found the commit. https://issues.apache.org/jira/browse/SPARK-20657 After the PR correspond to the JIRA, this behavior change occurred. Also, In the PR discussion hasn't mentioned whether it is intentional or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23134: [SPARK-25504][Docs] Update doc about retained tas...
Github user shahidki31 closed the pull request at: https://github.com/apache/spark/pull/23134 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23134: [SPARK-25504][Docs] Update doc about retained tasks, job...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23134 In the JIRA, when the user want retainedTasks as 6, actually retained around 58000. As per the code, it retained atleast 0.9*retainedTasks to retainedTasks. if we can clearly specify about the threshold, the confusion doesn't arises. If it is okay with the documentation, I can close the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23134: [SPARK-25504][Docs] Update doc about retained tasks, job...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23134 Hi @srowen , Some of the users seems getting confused with the threshold values, is it worth to update the docs? Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23134: [SPARK-25504][Docs] Update doc about retained tas...
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/23134 [SPARK-25504][Docs] Update doc about retained tasks, jobs and executions ## What changes were proposed in this pull request? Updated the documentation about spark.ui.retainedTasks, spark.ui.retainedJobs, spark.ui.retainedStages ## How was this patch tested? NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark docUpdate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23134.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23134 commit ca62b7ef2dc1582ae021b63d1aecd518f3deb8a8 Author: Shahid Date: 2018-11-25T16:02:03Z update doc about retained tasks, jobs and executions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23038 @vanzin could you please check the updated changes, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 @srowen , could you please review the PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr' in the...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23016 > I am currently using it with spark 2.3 as > > org.apache.spark > spark-mllib_2.11 > > How can i get this fix? You can cherry pick the commit from the master branch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr' in the...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23016 @idlevi Actually, input and output of the prefix span are RDD. Earlier intermediate rdd was cached, now final rdd is cached, and materialized it. So, if you materialize the model, earlier it will compute from the intermediate level, now it directly get from the finalRdd. I ran all the UTs in the prefixSpanSuite, and there is hardly any time difference with/without the patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [WIP][SPARK-26119][CORE][WEBUI]Task summary table should...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 (Still need more things to test, so changed to WIP) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [WIP][SPARK-26119][CORE][WEBUI]Task summary table should...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 @srowen Yes. I did check the commit history, and the PR which modified the behavior didn't mention about this behavior change. If it is intentional, then we should change the table title of the table from "summary metrics of completed tasks" to "summary metrics of all tasks". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
GitHub user shahidki31 reopened a pull request: https://github.com/apache/spark/pull/23088 [SPARK-26119][CORE][WEBUI]Task summary table should contain only successful tasks' metrics ## What changes were proposed in this pull request? Task summary table in the stage page currently displays the summary of all the tasks. However, we should display the task summary of only successful tasks, to follow the behavior of previous versions of spark. ## How was this patch tested? Added UT. attached screenshot Before patch: ![screenshot from 2018-11-20 00-36-18](https://user-images.githubusercontent.com/23054875/48729339-62e3a580-ec5d-11e8-81f0-0d191a234ffe.png) ![screenshot from 2018-11-20 01-18-37](https://user-images.githubusercontent.com/23054875/48731112-41d18380-ec62-11e8-8c31-1ffbfa04e746.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark summaryMetrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23088.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23088 commit cfe2d5f3744f2d10d917883713cd78678b5157a1 Author: Shahid Date: 2018-11-19T18:23:57Z task summary should contain only successful tasks commit 8f4498e0c67f8a83401f3b0e06aef4922ef49c20 Author: Shahid Date: 2018-11-21T18:37:01Z update PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23081: [SPARK-26109][WebUI]Duration in the task summary metrics...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23081 Thank you @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org