[GitHub] spark pull request: [SPARK-14290][SPARK-13352][CORE][backport-1.6]...
Github user liyezhang556520 closed the pull request at: https://github.com/apache/spark/pull/12296 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid copy in com...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12038#issuecomment-208659447 @davies , I didn't see the commit in branch-1.6 either, seems this commit can not be simply git cherry-pick because the file path is not the same in master and branch-1.6. Do I need to submit another PR for back-port? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-208329946 @davies , please see https://github.com/apache/spark/pull/12296 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][backport-1.6] avoid signif...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12296#issuecomment-208218812 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][backport-1.6] avoid signif...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/12296 [SPARK-14290][CORE][backport-1.6] avoid significant memory copy in Netty's tran⦠## What changes were proposed in this pull request? When netty transfer data that is not `FileRegion`, data will be in format of `ByteBuf`, If the data is large, there will occur significant performance issue because there is memory copy underlying in `sun.nio.ch.IOUtil.write`, the CPU is 100% used, and network is very low. In this PR, if data size is large, we will split it into small chunks to call `WritableByteChannel.write()`, so that avoid wasting of memory copy. Because the data can't be written within a single write, and it will call `transferTo` multiple times. ## How was this patch tested? Spark unit test and manual test. Manual test: `sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length` For more details, please refer to [SPARK-14290](https://issues.apache.org/jira/browse/SPARK-14290) You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark apache-branch-1.6-spark-14290 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12296.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12296 commit 9e37e7c4d1e495da91f68e912636dcf865691e39 Author: Zhang, Liye Date: 2016-04-11T08:05:44Z SPARK-14290/SPARK-13352 avoid significant memory copy in Netty's transferTo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-206180807 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-206168009 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-206096350 @vanzin , @zsxwing , I have changed the buffer limit to 256K. I do agree that it's better we handle this issue by manually copying data to directBuffer, so no duplicate copy will be made. But that will introduce code complexity somehow. How about we merge this PR first, and in further, I can create a new JIRA and submit a new PR to use directByteBufferPool to avoid duplicate memory copy? Because once we use directBuffer, the code is supposed to be take care of and should be well reviewed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-205631561 @zsxwing , @vanzin Any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/12083#discussion_r58287737 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java --- @@ -44,6 +45,14 @@ private long totalBytesTransferred; /** + * When the write buffer size is larger than this limit, I/O will be done in chunks of this size. + * The size should not be too large as it will waste underlying memory copy. e.g. If network + * avaliable buffer is smaller than this limit, the data cannot be sent within one single write + * operation while it still will make memory copy with this size. + */ + private static final int NIO_BUFFER_LIMIT = 512 * 1024; --- End diff -- >What if we create DirectByteBuffer here manually for a big buf (big enough so that we can get benefits even if creating a direct buffer is slow) and try to write as many as possible? Then we can avoid the memory copy in IOUtil.write. @zsxwing , Yes, redundant copy can be avoided if we give a directBuffer directly to `WritableByteChannel.write()` because of code in line http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#50, but I don't know if that's worthwhile. `IOUtil` will maintain a directBuffer pool to avoid frequently allocate the directBuffers. I think that's why when I made the test, the first time I run code `sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Long](1024 * 1024 * 200)).iterator).reduce((a,b)=> a).length`, the network throughput is extremely low on executor side, and if I ran this code after I ran the code `sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length`, the network throughput will be much higher. So, If we want create direct Buffer manually in Spark, It's better also maintain a buffer pool, but that will introduce much more complexity and have the risk of memory leak. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/12083#discussion_r58287444 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java --- @@ -44,6 +45,14 @@ private long totalBytesTransferred; /** + * When the write buffer size is larger than this limit, I/O will be done in chunks of this size. + * The size should not be too large as it will waste underlying memory copy. e.g. If network + * avaliable buffer is smaller than this limit, the data cannot be sent within one single write + * operation while it still will make memory copy with this size. + */ + private static final int NIO_BUFFER_LIMIT = 512 * 1024; --- End diff -- >On my machine, /proc/sys/net/core/wmem_default is around 200k, which (I assume) means you'd be copying about half of the buffer with no need here. @vanzin , on my machine, both `wmem_default` and `wmem_max` are also around 200K, but in my test, I can successfully write more than 512K for each `WritableByteChannel.write()`, this size should be the same with return size of `writeFromNativeBuffer` as in line http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#65. I don't know why. Can you also make a test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/12083#discussion_r58287274 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java --- @@ -44,6 +45,14 @@ private long totalBytesTransferred; /** + * When the write buffer size is larger than this limit, I/O will be done in chunks of this size. + * The size should not be too large as it will waste underlying memory copy. e.g. If network + * avaliable buffer is smaller than this limit, the data cannot be sent within one single write + * operation while it still will make memory copy with this size. + */ + private static final int NIO_BUFFER_LIMIT = 512 * 1024; --- End diff -- >Is it possible to know the accurate number? I guess not because it's OS dependent and may be changed vis OS settings. There might be a way to get the accurate number of the network buffer, but I think it's meaningless to do that because even we get the accurate number, we cannot guarantee the network send buffer is empty each time we write the data, which means, it's always possible that we can only write part of the data whatever size we set `NIO_BUFFER_LIMIT`. We can only say the smaller the `NIO_BUFFER_LIMIT` is, the less redundant copy will be made. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-204632236 >So if we write a 1M buffer, it can only write NIO_BUFFER_LIMIT (512K). And we need to write the reset 512K again. So in this case, we need to copy 1M + 512K bytes. If we divide 1M buffer to 2 * 512K buffers, then we only need to copy 512K + 512K bytes. Is it correct? @zsxwing , yes that right, so there will be tremendous copies if the data to be written is huge. >So basically, if the source buffer is not a direct buffer, that class is making a copy of the whole source buffer before trying to write it to the channel. That's, uh, a little silly, but I guess it's something we have to live with... Yes, we have to live with it if the buffer is not a direct buffer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-204207886 Hi @vanzin , the memory copy place is given out by @zsxwing , the call stack is as follows: ``` at java.nio.Bits.copyFromArray(Bits.java:754) at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:371) at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:342) at sun.nio.ch.IOUtil.write(IOUtil.java:60) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:466) - locked <0x7f8a8a28d400> (a java.lang.Object) at org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:131) at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:114) ``` The whole buffer copy is in line http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#60, but the buffer cannot be totally written if its side more than the available underlying buffer. Which is in line http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#65. So each time we will make a copy of the input `ByteBuf`, and write only a part of it if the input size is big relatively. This results in multiply copies of the input `ByteBuf` that is not necessary. The method of handling the issue in this PR is the same as that in Hadoop, please refer to https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L2957 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid copy in com...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12038#issuecomment-204199865 @zsxwing , I updated the commit description. Thank you @zsxwing and @vanzin for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-203872402 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/12083#discussion_r58032868 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java --- @@ -44,6 +45,14 @@ private long totalBytesTransferred; /** + * When the write buffer size is larger than this limit, I/O will be done in chunks of this size. + * The size should not be too large as it will waste underlying memory copy. e.g. If network + * avaliable buffer is smaller than this limit, the data cannot be sent within one single write + * operation while it still will make memory copy with this size. + */ + private static final int NIO_BUFFER_LIMIT = 512 * 1024; --- End diff -- I set this limit to 512K because in my test, it can successfully write about 600KB ~1.5MB size data for each `WritableByteChannel.write()`. This size need to be decided after more tests by someone else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/12083 [SPARK-14290][CORE][Network] avoid significant memory copy in netty's transferTo ## What changes were proposed in this pull request? When netty transfer data that is not `FileRegion`, data will be in format of `ByteBuf`, If the data is large, there will occur significant performance issue because there is memory copy underlying in `sun.nio.ch.IOUtil.write`, the CPU is 100% used, and network is very low. In this PR, if data size is large, we will split it into small chunks to call `WritableByteChannel.write()`, so that avoid wasting of memory copy. Because the data can't be written within a single write, and it will call `transferTo` multiple times. ## How was this patch tested? Spark unit test and manual test. Manual test: sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length For more details, please refer to [SPARK-14290](https://issues.apache.org/jira/browse/SPARK-14290) You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark spark-14290 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12083.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12083 commit 63ca85a5548858b4fe46a4ade062776cb6747cba Author: Zhang, Liye Date: 2016-03-31T09:44:41Z spark-14290 avoid significant memory copy in netty's transferTo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid copy in com...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12038#issuecomment-203724131 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid too many co...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12038#issuecomment-203715485 @vanzin >That's better, but is it needed at all? I don't see any comments about why consolidating the buffers is a win in the source for CompositeByteBuf. Traversing the single buffer should be slightly faster because there's less bookkeeping, but there's the cost of copying that data in the first place. If so we can just set the `maxNumComponents` with `Integer.Max_VALUE` for `compositeByteBuffer`. >When testing this code, I remember that during large transfers packets would arrive in 64k chunks at the most, so that means that once you're transferring more than 1MB, you'd have to copy things. In my test, the chunk sizes are mainly around 20~30 KB. >Have you tried not consolidating to see whether there's any negative side-effect? I tested previously with `buffers.getFirst().alloc().compositeBuffer(Integer.MAX_VALUE);`, and with frame size over 1GB, which consist of about 4 chunks, I didn't see negative side-effect. Ok, let's solve this issue without copy for any case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid using compo...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12038#issuecomment-203213286 @vanzin , I think @zsxwing 's idea of using `CompositeByteBuf.addComponents` is a better choice, which will only introduce exactly one copy if the small buffer number is lager than 16 and will not introduce any copy if that less than 16. Let me update this PR first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid using compo...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/12038#discussion_r57830669 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportFrameDecoder.java --- @@ -139,14 +139,18 @@ private ByteBuf decodeNext() throws Exception { return nextBufferForFrame(remaining); } -// Otherwise, create a composite buffer. -CompositeByteBuf frame = buffers.getFirst().alloc().compositeBuffer(); --- End diff -- @vanzin , I'm not sure why Netty underlying set a maximum number components (max size is Integer.MAX_VALUE), and the default value is only 16, this seems very small for consolidation. Will it occurs other problem when there are too many small buffers under `compositeBuffer`? Is that why it will consolidate when the small buffer number reaches the `maxNumCompnent`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid using compo...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/12038 [SPARK-14242][CORE][Network] avoid using compositeBuffer for frame decoder ## What changes were proposed in this pull request? In this patch, we avoid using `compositeBuffer` in `TransportFrameDecoder` because `compositeBuffer` will introduce too many memory copies when the frame size is large (which result in many transport messages). For details, please refer to [SPARK-14242](https://issues.apache.org/jira/browse/SPARK-14242). ## How was this patch tested? spark unit tests and manual tests. For manual tests, we can reproduce the performance issue with following code: `sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length` It's easy to see the performance gain, both from the running time and CPU usage. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark spark-14242 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12038.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12038 commit 8908585c3029de005e9816d711b0d7ee86398a12 Author: Zhang, Liye Date: 2016-03-29T15:04:36Z spark-14242 avoid using compositeBuffer for frame decoder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid using compo...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/12038#discussion_r57741705 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportFrameDecoder.java --- @@ -139,14 +139,18 @@ private ByteBuf decodeNext() throws Exception { return nextBufferForFrame(remaining); } -// Otherwise, create a composite buffer. -CompositeByteBuf frame = buffers.getFirst().alloc().compositeBuffer(); --- End diff -- actually we can set the maxNumComponents for `compositeBuffer` to avoid consolidate underlying, such as `CompositeByteBuf frame = buffers.getFirst().alloc().compositeBuffer(Integer.MAX_VALUE);`, but this might be not a good choice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid using compo...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/12038#discussion_r57742478 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportFrameDecoder.java --- @@ -139,14 +139,18 @@ private ByteBuf decodeNext() throws Exception { return nextBufferForFrame(remaining); } -// Otherwise, create a composite buffer. -CompositeByteBuf frame = buffers.getFirst().alloc().compositeBuffer(); +// Otherwise, create a new buffer to hold the data of all frame splits. +ByteBuf frame = buffers.getFirst().alloc().buffer(remaining, remaining); --- End diff -- here allocate a new `ByteBuf`, it will result in one time copy for each netty frame, exactly one time copy operation is much more acceptable compared with too many copies of `compositeBuffer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-170942709 Hi @steveloughran , thank you for your attention and your comments, and your further comments on this PR are pretty appreciated. Your advice is quite correct, we need to make a regression test for history server suite. I think this is related with what @squito mentioned that we need to add json api support. I'm wondering whether we can make it done in another PR after this PR be able to merge or this feature totally accepted by the community. And of course, rest api support is supposed to be one part of this feature. @squito, what's your opinion on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10608][CORE] disable reduce locality as...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/8765#issuecomment-163818782 @zsxwing , Since the original implementation has changed a lot, let me close this PR/JIRA first, if the issue still exists, I'll reopen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10608][CORE] disable reduce locality as...
Github user liyezhang556520 closed the pull request at: https://github.com/apache/spark/pull/8765 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r46154647 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -85,6 +85,9 @@ private[spark] class Executor( env.blockManager.initialize(conf.getAppId) } + private val executorMetrics: ExecutorMetrics = new ExecutorMetrics + executorMetrics.setHostname(Utils.localHostName) --- End diff -- Sorry that I forgot the reason why I didn't get the port from `rpcEnv` originally, but it seems we can get the port from it directly. Let me try to add it back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r46153585 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -228,6 +261,46 @@ private[spark] class EventLoggingListener( fileSystem.rename(new Path(logPath + IN_PROGRESS), target) } + /** + * According to the updated event to modify the maintained event's metrics + * @param executorId the executor whose metrics will be modified + */ + private def updateModifiedMetrics(executorId: String): Unit = { +val toBeModifiedEvent = executorIdToModifiedMaxMetrics.get(executorId) +val latestEvent = executorIdToLatestMetrics.get(executorId) +toBeModifiedEvent match { + case None => if (latestEvent.isDefined) executorIdToModifiedMaxMetrics.update( +executorId, latestEvent.get) --- End diff -- @squito , I'm sorry I made a mistake here, you are really careful. Thank you correct me again. You are right that `executorIdToLatestMetrics` will be updated before we call `updateModifiedMetrics`, I'll remove the `if` statement. Sorry to mislead you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r46153027 --- Diff: core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala --- @@ -122,6 +122,105 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit "a fine:mind$dollar{bills}.1", None, Some("lz4"))) } + test("test event logger logging executor metrics") { +import org.apache.spark.scheduler.cluster._ +import org.apache.spark.ui.memory._ +val conf = EventLoggingListenerSuite.getLoggingConf(testDirPath) +val eventLogger = new EventLoggingListener("test-memListener", None, testDirPath.toUri(), conf) +val execId = "exec-1" +val hostName = "host-1" + +eventLogger.start() +eventLogger.onExecutorAdded(SparkListenerExecutorAdded( + 0L, execId, new ExecutorInfo(hostName, 1, Map.empty))) + +// stage 1 and stage 2 submitted + eventLogger.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(1)) + eventLogger.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(2)) +val execMetrics1 = MemoryListenerSuite.createExecutorMetrics(hostName, 1L, 20, 10) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics1)) +val execMetrics2 = MemoryListenerSuite.createExecutorMetrics(hostName, 2L, 30, 10) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics2)) +// stage1 completed + eventLogger.onStageCompleted(MemoryListenerSuite.createStageEndEvent(1)) +// stage3 submitted + eventLogger.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(3)) +val execMetrics3 = MemoryListenerSuite.createExecutorMetrics(hostName, 3L, 30, 30) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics3)) +val execMetrics4 = MemoryListenerSuite.createExecutorMetrics(hostName, 4L, 20, 25) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics4)) +// stage 2 completed + eventLogger.onStageCompleted(MemoryListenerSuite.createStageEndEvent(2)) +val execMetrics5 = MemoryListenerSuite.createExecutorMetrics(hostName, 5L, 15, 15) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics5)) +val execMetrics6 = MemoryListenerSuite.createExecutorMetrics(hostName, 6L, 25, 10) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics6)) +// stage 3 completed + eventLogger.onStageCompleted(MemoryListenerSuite.createStageEndEvent(3)) + +eventLogger.onExecutorRemoved(SparkListenerExecutorRemoved(7L, execId, "")) + +// Totally there are 15 logged events, including: +// 2 events of executor Added/Removed +// 6 events of stage Submitted/Completed +// 7 events of executorMetrics update (3 combined metrics and 4 original metrics) +assert(eventLogger.loggedEvents.size === 15) +eventLogger.stop() + +val logData = EventLoggingListener.openEventLog(new Path(eventLogger.logPath), fileSystem) +val lines = readLines(logData) +Utils.tryWithSafeFinally { + // totally there are 15 lines, including SparkListenerLogStart event and 14 other events + assert(lines.size === 16) + + // 4 executor metrics that is the latest metrics updated before stage submit and complete + val jsonMetrics = JsonProtocol.sparkEventFromJson(parse(lines(5))) --- End diff -- Good idea, I'll add an integration test, make it cleaner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r46152802 --- Diff: core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala --- @@ -122,6 +122,105 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit "a fine:mind$dollar{bills}.1", None, Some("lz4"))) } + test("test event logger logging executor metrics") { +import org.apache.spark.scheduler.cluster._ +import org.apache.spark.ui.memory._ +val conf = EventLoggingListenerSuite.getLoggingConf(testDirPath) +val eventLogger = new EventLoggingListener("test-memListener", None, testDirPath.toUri(), conf) +val execId = "exec-1" +val hostName = "host-1" + +eventLogger.start() +eventLogger.onExecutorAdded(SparkListenerExecutorAdded( + 0L, execId, new ExecutorInfo(hostName, 1, Map.empty))) + +// stage 1 and stage 2 submitted + eventLogger.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(1)) + eventLogger.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(2)) +val execMetrics1 = MemoryListenerSuite.createExecutorMetrics(hostName, 1L, 20, 10) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics1)) +val execMetrics2 = MemoryListenerSuite.createExecutorMetrics(hostName, 2L, 30, 10) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics2)) +// stage1 completed + eventLogger.onStageCompleted(MemoryListenerSuite.createStageEndEvent(1)) +// stage3 submitted + eventLogger.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(3)) +val execMetrics3 = MemoryListenerSuite.createExecutorMetrics(hostName, 3L, 30, 30) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics3)) +val execMetrics4 = MemoryListenerSuite.createExecutorMetrics(hostName, 4L, 20, 25) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics4)) +// stage 2 completed + eventLogger.onStageCompleted(MemoryListenerSuite.createStageEndEvent(2)) +val execMetrics5 = MemoryListenerSuite.createExecutorMetrics(hostName, 5L, 15, 15) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics5)) +val execMetrics6 = MemoryListenerSuite.createExecutorMetrics(hostName, 6L, 25, 10) + eventLogger.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId, execMetrics6)) +// stage 3 completed + eventLogger.onStageCompleted(MemoryListenerSuite.createStageEndEvent(3)) + +eventLogger.onExecutorRemoved(SparkListenerExecutorRemoved(7L, execId, "")) + +// Totally there are 15 logged events, including: +// 2 events of executor Added/Removed +// 6 events of stage Submitted/Completed +// 7 events of executorMetrics update (3 combined metrics and 4 original metrics) +assert(eventLogger.loggedEvents.size === 15) +eventLogger.stop() + +val logData = EventLoggingListener.openEventLog(new Path(eventLogger.logPath), fileSystem) +val lines = readLines(logData) +Utils.tryWithSafeFinally { + // totally there are 15 lines, including SparkListenerLogStart event and 14 other events --- End diff -- Yes, forget to update the comments, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r46152658 --- Diff: core/src/test/scala/org/apache/spark/ui/memory/MemoryListenerSuite.scala --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.memory + +import org.apache.spark._ +import org.apache.spark.executor._ +import org.apache.spark.scheduler._ +import org.apache.spark.scheduler.cluster._ + +class MemoryListenerSuite extends SparkFunSuite with LocalSparkContext { + test("test HashMap size for MemoryListener") { +val listener = new MemoryListener +val execId1 = "exec-1" +val execId2 = "exec-2" + +(1 to 2).foreach { i => + listener.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(i)) + listener.onStageCompleted(MemoryListenerSuite.createStageEndEvent(i)) +} +// stages are all completed, no activeStages now +assert(listener.activeStagesToMem.isEmpty) + + listener.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId1, new ExecutorMetrics)) +// ExecutorMetrics is not related with Stages directly +assert(listener.activeStagesToMem.isEmpty) + +listener.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(3)) + listener.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId2, new ExecutorMetrics)) +// totally 2 executors updated their metrics +assert(listener.activeExecutorIdToMem.size === 2) +assert(listener.activeStagesToMem.size === 1) +listener.onStageCompleted(MemoryListenerSuite.createStageEndEvent(3)) + +assert(listener.activeStagesToMem.isEmpty) +assert(listener.completedStagesToMem.size === 3) +assert(listener.activeExecutorIdToMem.size === listener.latestExecIdToExecMetrics.size) +assert(listener.removedExecutorIdToMem.isEmpty) + } + + test("test first stage with no executor metrics update") { +val listener = new MemoryListener +val execId1 = "exec-1" + +listener.onExecutorAdded( + SparkListenerExecutorAdded(0L, execId1, new ExecutorInfo("host1", 1, Map.empty))) + +// stage 1, no metrics update +listener.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(1)) +listener.onStageCompleted(MemoryListenerSuite.createStageEndEvent(1)) + +// stage 2, with one metrics update +listener.onStageSubmitted(MemoryListenerSuite.createStageStartEvent(2)) +val execMetrics = MemoryListenerSuite.createExecutorMetrics("host-1", 0L, 20, 10) + listener.onExecutorMetricsUpdate(MemoryListenerSuite.createExecutorMetricsUpdateEvent( + execId1, execMetrics)) +listener.onStageCompleted(MemoryListenerSuite.createStageEndEvent(2)) + +val mapForStage1 = listener.completedStagesToMem.get((1, 0)).get +// no metrics for stage 1 since no metrics update for stage 1 +assert(mapForStage1.get(execId1).get.transportInfo === None) +val mapForStage2 = listener.completedStagesToMem.get((2, 0)).get +assert(mapForStage2.size === 1) +val memInfo = mapForStage2.get(execId1).get +assert(memInfo.transportInfo.isDefined) +val transMetrics = memInfo.transportInfo.get +assert((20, 10, MemTime(20, 0), MemTime(10, 0)) === (transMetrics.onHeapSize, + transMetrics.offHeapSize, transMetrics.peakOnHeapSizeTime, transMetrics.peakOffHeapSizeTime)) + +listener.onExecutorRemoved(SparkListenerExecutorRemoved(0L, execId1, "")) + } + + test("test multiple metrics updated in one stage") { +val listener = new MemoryListener +val execId1 = "exec-1" + +listener.onExecutorAdded( + SparkListenerEx
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r46152192 --- Diff: core/src/test/scala/org/apache/spark/ui/memory/MemoryListenerSuite.scala --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ui.memory + +import org.apache.spark._ +import org.apache.spark.executor._ +import org.apache.spark.scheduler._ +import org.apache.spark.scheduler.cluster._ + +class MemoryListenerSuite extends SparkFunSuite with LocalSparkContext { --- End diff -- No, I'll remove this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r46152057 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorMetrics.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import org.apache.spark.annotation.DeveloperApi + +/** + * :: DeveloperApi :: + * Metrics tracked during the execution of an executor. + * + * So, when adding new fields, take into consideration that the whole object can be serialized for + * shipping off at any time to consumers of the SparkListener interface. + */ +@DeveloperApi +class ExecutorMetrics extends Serializable { + + /** + * Host's name the executor runs on + */ + private var _hostname: String = _ + def hostname: String = _hostname + private[spark] def setHostname(value: String) = _hostname = value + + private var _transportMetrics: TransportMetrics = new TransportMetrics + def transportMetrics: TransportMetrics = _transportMetrics + private[spark] def setTransportMetrics(value: TransportMetrics) = { +_transportMetrics = value + } + + // for test only + def metricsDetails: (String, Long, Long, Long) = { +(hostname, transportMetrics.timeStamp, transportMetrics.onHeapSize, + transportMetrics.offHeapSize) + } +} + +/** + * :: DeveloperApi :: + * Metrics for network layer + */ +@DeveloperApi +class TransportMetrics ( --- End diff -- Yes, that's true, thank you for pointing it out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-158924858 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-158890951 @squito thank you for your comments, it helped a lot, I updated some unit tests. If you got time, can you help to review? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r45585344 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -228,6 +260,40 @@ private[spark] class EventLoggingListener( fileSystem.rename(new Path(logPath + IN_PROGRESS), target) } + /** + * According to the updated event to modify the maintained event's metrics + * @param executorId the executor whose metrics will be modified + */ + private def updateModifiedMetrics(executorId: String): Unit = { +val toBeModifiedEvent = executorIdToModifiedMaxMetrics.get(executorId) +val latestEvent = executorIdToLatestMetrics.get(executorId) +if (toBeModifiedEvent.isEmpty) { + if (latestEvent.isDefined) executorIdToModifiedMaxMetrics.update(executorId, latestEvent.get) +} else { + val toBeModifiedMetrics = toBeModifiedEvent.get.executorMetrics.transportMetrics + if (toBeModifiedMetrics.isDefined) { --- End diff -- `latestEvent` should not always be defined when `toBeModifiedEvent.isEmpty` is true. The case is that at the beginning, the `latestEvent` is empty before the first metrics update event received. Thank you for you cleaner code style example, I've updated in my code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r45584922 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorMetrics.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import org.apache.spark.annotation.DeveloperApi + +/** + * :: DeveloperApi :: + * Metrics tracked during the execution of an executor. + * + * So, when adding new fields, take into consideration that the whole object can be serialized for + * shipping off at any time to consumers of the SparkListener interface. + */ +@DeveloperApi +class ExecutorMetrics extends Serializable { + + /** + * Host's name the executor runs on + */ + private var _hostname: String = _ + def hostname: String = _hostname + private[spark] def setHostname(value: String) = _hostname = value + + private var _transportMetrics: Option[TransportMetrics] = None --- End diff -- changed it to be always present in the updated code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r45584881 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -91,7 +93,12 @@ private[spark] class EventLoggingListener( private[scheduler] val loggedEvents = new ArrayBuffer[JValue] // Visible for tests only. - private[scheduler] val logPath = getLogPath(logBaseDir, appId, appAttemptId, compressionCodecName) + private[scheduler] val logPath = getLogPath( +logBaseDir, appId, appAttemptId, compressionCodecName) --- End diff -- yes, my mistake, the width is 100, it's correct. I'll change it back later, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r45584487 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -20,8 +20,10 @@ package org.apache.spark.scheduler import java.io._ import java.net.URI +import org.apache.spark.executor.TransportMetrics --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r45584427 --- Diff: core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala --- @@ -47,6 +50,39 @@ class NettyBlockTransferService(conf: SparkConf, securityManager: SecurityManage private[this] var server: TransportServer = _ private[this] var clientFactory: TransportClientFactory = _ private[this] var appId: String = _ + private[this] var clock: Clock = new SystemClock() + + /** + * Use a different clock for this allocation manager. This is mainly used for testing. + */ + def setClock(newClock: Clock): Unit = { +clock = newClock + } + + override def getMemMetrics(executorMetrics: ExecutorMetrics): Unit = { --- End diff -- the super class is `private[spark]`, do we still need to specify here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r45584147 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorMetrics.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import org.apache.spark.annotation.DeveloperApi + +/** + * :: DeveloperApi :: + * Metrics tracked during the execution of an executor. + * + * So, when adding new fields, take into consideration that the whole object can be serialized for + * shipping off at any time to consumers of the SparkListener interface. + */ +@DeveloperApi +class ExecutorMetrics extends Serializable { + + /** + * Host's name the executor runs on + */ + private var _hostname: String = _ + def hostname: String = _hostname + private[spark] def setHostname(value: String) = _hostname = value + + private var _transportMetrics: Option[TransportMetrics] = None + def transportMetrics: Option[TransportMetrics] = _transportMetrics + private[spark] def setTransportMetrics(value: Option[TransportMetrics]) = { +_transportMetrics = value + } +} + +/** + * :: DeveloperApi :: + * Metrics for network layer + */ +@DeveloperApi +case class TransportMetrics( +timeStamp: Long, +onHeapSize: Long, +directSize: Long) --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-153354857 @squito, sorry for long delay updating, any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-153299381 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-142980989 >1. I don't think there is any need to separate out the memory used by the client and server portions. These are internal details that the end-user doesn't care about -- in fact you're already correctly simplifying by the time it gets to the web UI, but you could really combine them immediately in TransportMetrics. That would help simplify the code I think. This make sense, I'll merge them in the Metrics itself. > 2. We shouldn't say "Net" memory used, users might think that means total memory used. I guess we should say "Network" Oh, I forgot thet "net" has another meaning Sorry for that. > 4.Could the stage table include the max memory per executor per stage as well? That would be great to help users quickly identify the stages which require the most memory Yes, User can see the max memory per executor per stage, there is a `StageMemoryPage` attached to `MemoryTab`, which will show the executors' status (max memory) for each finished stage. > 5. How many additional events are getting logged? With the current architecture, there is some pressure to not log too much -- both to keep log sizes small for later processing, and also to make sure the driver doesn't get too busy just logging (which eventually leads to it dropping events). additional events are only logged when there is stages complete or executors removed. Please see the [design doc](https://issues.apache.org/jira/secure/attachment/12762171/Tracking%20Spark%20Memory%20Usage%20-%20Phase%201.pdf) but the code is not aligned with the doc yet, I'll update the code first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r40339731 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -152,8 +159,19 @@ private[spark] class EventLoggingListener( } } + // We log the event both when stage submitted and stage completed, and after each logEvent call, + // replace the modifiedMetrics with the latestMetrics. In case the stages submit and complete + // time might be interleaved. So as to make the result the same with the running time. + private def logMetricsUpdateEvent() : Unit = { +modifiedMetrics.map(metrics => logEvent(metrics._2)) +latestMetrics.map(metrics => modifiedMetrics.update(metrics._1, metrics._2)) + } --- End diff -- > I don't understand the last two sentences of the comment -- can you expand on that? I'll update the code according to the design doc. I think the code is not that correct. Please refer it in [design doc](https://issues.apache.org/jira/secure/attachment/12762171/Tracking%20Spark%20Memory%20Usage%20-%20Phase%201.pdf) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r40302500 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -229,9 +228,11 @@ private[spark] object JsonProtocol { def executorMetricsUpdateToJson(metricsUpdate: SparkListenerExecutorMetricsUpdate): JValue = { val execId = metricsUpdate.execId val taskMetrics = metricsUpdate.taskMetrics +val executorMetrics = metricsUpdate.executorMetrics ("Event" -> Utils.getFormattedClassName(metricsUpdate)) ~ ("Executor ID" -> execId) ~ -("Metrics Updated" -> taskMetrics.map { case (taskId, stageId, stageAttemptId, metrics) => +("Executor Metrics Updated" -> executorMetricsToJson(executorMetrics)) ~ +("Task Metrics Updated" -> taskMetrics.map { case (taskId, stageId, stageAttemptId, metrics) => --- End diff -- I didn't notice that, thanks for pointing it out, I will change that back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r40302251 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -20,8 +20,11 @@ package org.apache.spark.scheduler import java.io._ import java.net.URI +import akka.remote.transport.Transport --- End diff -- You are right, it's an accidental import, good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r40302144 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -447,7 +450,16 @@ private[spark] class Executor( } } -val message = Heartbeat(executorId, tasksMetrics.toArray, env.blockManager.blockManagerId) +env.blockTransferService.getMemMetrics(this.executorMetrics) +val executorMetrics = if (isLocal) { + Utils.deserialize[ExecutorMetrics](Utils.serialize(this.executorMetrics)) --- End diff -- This is due to [SPARK-3465](https://issues.apache.org/jira/browse/SPARK-3465). Currently we do not have any aggregation operations for `ExecutorMetrics`, we can remove this. We can add it back when we do some aggregation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r40301519 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -85,6 +85,9 @@ private[spark] class Executor( env.blockManager.initialize(conf.getAppId) } + private val executorMetrics: ExecutorMetrics = new ExecutorMetrics + executorMetrics.setHostname(Utils.localHostName) --- End diff -- It's better to use HOST:PORT, but executor port cannot get here, which should get from `RpcEnv.adress.port`. We can get `executorId` on the driver side when receiving the message, that might be enough to identify the different executors, but since we will show the removed executors on the page, so we cannot know where the executor locate by only `executorId`, because the Executor tab only shows the active executors. we can remove the hostname here if we support showing the removed executors on Executor tab. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10608][CORE] disable reduce locality as...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/8765#issuecomment-140355454 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...
Github user liyezhang556520 closed the pull request at: https://github.com/apache/spark/pull/2134 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/2134#issuecomment-140318245 ok, I'll close this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10608][CORE] disable reduce locality as...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/8765 [SPARK-10608][CORE] disable reduce locality as default for details, pls refer to [SPARK-10608](https://issues.apache.org/jira/browse/SPARK-10608) You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark SPARK-10608 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8765.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8765 commit ce3eb795f1b7d397c9825f08b7b6322e8f5a3a57 Author: Zhang, Liye Date: 2015-09-15T08:16:49Z [SPARK-10608] disable reduce locality as default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC] add missing parameters in SparkContext.s...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/8412#issuecomment-134526849 @srowen , I added the missing params for APIs that intending to adding params in the whole file. I'm wondering whether those missing `@param` are on purpose or not. Since in current spark code, there are many places missing `@param` and `@return`, if we want to make the doc better, I think someone need to open a JIRA to solve it. And I submit this PR because `SparkContext.scala` is a very import file in spark core, and It's constructor APIs should be with complete scala docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC] add missing parameters for scala doc
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/8412 [DOC] add missing parameters for scala doc You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark minorDoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8412.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8412 commit 58e9d2ef3f9b1f0661a3327ca71cb133717bb64f Author: Zhang, Liye Date: 2015-08-25T04:17:16Z add missing parameters for scala doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][SPARK-9105][SPARK-9106][SPARK-910...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-133629862 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7753#issuecomment-126163347 @jerryshao , thanks for your review and feedback, I'll separate it into different PRs. And I will keep this PR open and waiting for others to see whether they have different thoughts on the whole implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/7753#discussion_r35832655 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorMetrics.scala --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import org.apache.spark.annotation.DeveloperApi + +/** + * :: DeveloperApi :: + * Metrics tracked during the execution of an executor. + * + * So, when adding new fields, take into consideration that the whole object can be serialized for + * shipping off at any time to consumers of the SparkListener interface. + */ +@DeveloperApi +class ExecutorMetrics extends Serializable { + + type SysTime = Long + + /** + * Host's name the executor runs on + */ + private var _hostname: String = _ + def hostname: String = _hostname + private[spark] def setHostname(value: String) = _hostname = value + + /** + * Host's port the executor runs on + */ + private var _port: Int = _ + def port: Int = _port + private[spark] def setPort(value: Int) = _port = value + + def hostPort: String = hostname + ":" + port + + /** + * maximum on-heap memory that the executor used for shuffle read on Netty network layer + */ + @volatile private var _maxNettyReadOnheapSizeTime: (Long, SysTime) = (0L, 0L) + def maxNettyReadOnheapSizeTime: (Long, SysTime) = _maxNettyReadOnheapSizeTime --- End diff -- use `maxNettyReadOnheapSizeTime` to make the metrics more detail, since there is shuffle write Netty part (the stranportServer part), the `maxNettyOnHeapSizeTime` should be the sum of both side --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9104][CORE][WIP] expose Netty network l...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/7753 [SPARK-9104][CORE][WIP] expose Netty network layer memory used in shuffle read part This is a sub-task of [SPARK-9103](https://issues.apache.org/jira/browse/SPARK-9103), we'd like to expose the memory usage for spark running time, this is the first step to expose the netty buffer used both with on-heap and off-heap memory. Also the metrics are showed on WebUI. In this PR, a new web Tab name `Memory` is added. Which is used to show the memory usage of each executors (can be in more details in future). the screenshot is like the following: ![image](https://cloud.githubusercontent.com/assets/4716022/8965136/2d7a451c-365c-11e5-899f-133a1b042671.png) This is WIP because the exposed metrics are not recorded into eventlog yet and also unit tests are not added and some situations are not handled (executors added, removed, failed, etc.). One important thing is to get some feedback from the community, any comments and thoughts are really appreciated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark spark-9104-draft Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7753 commit 1917729e2f405daf404550a1873d0e857fc30742 Author: Zhang, Liye Date: 2015-07-21T06:42:38Z SPARK-9212 upgrade Netty version to 4.0.29.Final commit e6e7947440e3b05d164f3228b23d361052935f22 Author: Zhang, Liye Date: 2015-07-21T09:47:28Z Merge remote-tracking branch 'apache/master' commit ea1864e1d59b9d5089451932735c6a3be270633f Author: Zhang, Liye Date: 2015-07-22T05:51:15Z initial test, functionality test, test data fetch from netty commit edda857c3d763347037a44f898feb005827de4f0 Author: Zhang, Liye Date: 2015-07-22T07:00:54Z change the place to start metric collection commit e487e4daaf8bd67d1a4fa8da02ffa508d08c284f Author: Zhang, Liye Date: 2015-07-22T07:21:10Z start the metric collector back to server initial func commit 9cc65c352c42554d8f57f611e6892f847f26478c Author: Zhang, Liye Date: 2015-07-22T07:36:23Z start metric collection when starting block manager commit 467dedc3531471e556584159046f93f127415bfd Author: Zhang, Liye Date: 2015-07-22T07:38:41Z add executor as the parameter commit 8f26d9feed48a2fc10d1f2fe782bb2db1bbc90dd Author: Zhang, Liye Date: 2015-07-22T09:06:54Z chang int to long, avoid overflow commit 9d7d7d8b4713a4ab11326722e99c4ed79cf349e8 Author: Zhang, Liye Date: 2015-07-26T15:51:06Z add executor metrics commit d25bdfd87178220e5c97c95db7de6ce3cc4cdfd7 Author: Zhang, Liye Date: 2015-07-27T08:49:58Z initial webUI with memory metrics commit 3474d162e9236d89cdeea683d1e329ea17a89013 Author: Zhang, Liye Date: 2015-07-27T09:22:40Z code refinement commit de9a5edaf11e2a9a1e1791ce9eee8089f9d2e9d9 Author: Zhang, Liye Date: 2015-07-29T16:22:11Z collect metrics in heartbeat and add timestamp for max mem size commit 17e5b978618a5a6adfa3ff621e37eeecaa0b2b0c Author: Zhang, Liye Date: 2015-07-29T17:18:15Z avoid un-serializable clock for executor metrics --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9212][CORE] upgrade Netty version to 4....
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/7562#issuecomment-123213266 Maybe I made a misleading expression with "hack", I meant I didn't find a way to go into netty to get the memory usage for old versions since they didn't provide any public APIs. And I think there is no additional change to make here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9212][CORE] upgrade Netty version to 4....
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/7562 [SPARK-9212][CORE] upgrade Netty version to 4.0.29.Final You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark SPARK-9212 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7562.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7562 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4989][CORE] backport for branch-1.0(-jd...
Github user liyezhang556520 closed the pull request at: https://github.com/apache/spark/pull/3971 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7854][test] refine Kryo test suite
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/6395 [SPARK-7854][test] refine Kryo test suite this modification is accroding @JoshRosen 's comments, for details, please refer to [#5934](https://github.com/apache/spark/pull/5934/files#r30949751). You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark kryoTest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6395.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6395 commit da214c886f6effa2554fed64e69912a029c4d51b Author: Zhang, Liye Date: 2015-05-25T15:33:03Z refine Kryo test suite accroding to Josh's comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7392][Core] bugfix: Kryo buffer size ca...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/5934#discussion_r30971371 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -32,6 +32,36 @@ class KryoSerializerSuite extends FunSuite with SharedSparkContext { conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") conf.set("spark.kryo.registrator", classOf[MyRegistrator].getName) + test("configuration limits") { --- End diff -- Ok, I'll make a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...
Github user liyezhang556520 closed the pull request at: https://github.com/apache/spark/pull/3825 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...
Github user liyezhang556520 closed the pull request at: https://github.com/apache/spark/pull/2956 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/2956#issuecomment-103291479 I'm closing this, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7336][HistoryServer] Fix bug that appli...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5886#issuecomment-101308561 @vanzin , the current implementation will make SPARK-7189 worse, what about introduce a hashMap to maintain the filename, modifiedTime of each file, file size, say mutable.HashMap[String, Long, Long], it can not only handle the modification case and also rename/delete cases. Since each file's modification time is maintained, this can both solve the problem of the race condition in this issue and also solve SPARK-7189? And it will only introduce extra memory with size of the hashMap size. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7392][Core] bugfix: Kryo buffer size ca...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5934#issuecomment-99499678 @ilganeli , Thank you for your comments, code updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7392][Core] bugfix: Kryo buffer size ca...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/5934 [SPARK-7392][Core] bugfix: Kryo buffer size cannot be larger than 2M You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark kryoBufSize Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5934.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5934 commit d91e5ed340e518d8f30454ee39e89e7f658fe508 Author: Zhang, Liye Date: 2015-05-06T05:18:28Z change kb to mb --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7336][HistoryServer] Fix bug that appli...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5886#issuecomment-99322278 Hi @vanzin , you are correct that a single call to `listStatus` will return all information of the all available log files, but I'm not sure about the details of `listStatus` in HDFS, it seems when there are too many items in the directory, HDFS will fetch them in batch, and between the batches, there is a time gap. in which some of the write operations can be inserted. So my concern is whether the `getModificationTime` provided by HDFS `FileStatus` can be fully trusted? May be there are some scenarios that HDFS can guarantee. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7336][HistoryServer] Fix bug that appli...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/5886#discussion_r29731021 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -186,13 +186,14 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) try { val statusList = Option(fs.listStatus(new Path(logDir))).map(_.toSeq) .getOrElse(Seq[FileStatus]()) - var newLastModifiedTime = lastModifiedTime val logInfos: Seq[FileStatus] = statusList .filter { entry => try { getModificationTime(entry).map { time => - newLastModifiedTime = math.max(newLastModifiedTime, time) - time >= lastModifiedTime + val fileName = entry.getPath.getName + val oldLastModifiedTime = lastModifiedTimes.getOrElse(fileName, -1L) + lastModifiedTimes += (fileName -> time) + time > oldLastModifiedTime --- End diff -- We can not use `>` to check whether the file is modified. We must use `>=`. Since there might be modification to the file immediately after calling `getModificationTime`, the result of which is the modified time stay the same. We need use `>=` to check the time along with checking the file size changing. For some discussions, please refer to [SPARK-7189](https://issues.apache.org/jira/browse/SPARK-7189) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7336][HistoryServer] Fix bug that appli...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5886#issuecomment-99295131 @vanzin , there is time interval from getting the first file's modification time to the last file's. Assume there are 3 files: F1, F2, F3. And before scanning, their modification times are TF1=100, TF2=101, TF3=102 respectively. At time T1=103, we start scanning . At time T2=104, we finished loading F1 mod time, starting to loading F2 mod time. At time T3=107, we finished loading F2 mod time. At this point, `lastModifiedTime` is 101, which is equal to F2 mode time --- TF2. And during loading F2 mod time, there are two operations: First, at time T4=105, contents written to F1, which leads to F1 mod time changing from TF1=100 to TF1'=105 Second, at time T5=106, contents written to F3, which leads to F3 mod time changing from TF3=102 to TF3'=106. Then we continue to load F3 mode time, and at time T6=108, we finished loading F3 mode time. At this point, `lastModifiedTime` is 106. So for the next round, we would not pick up F1 even it has been modified at time T4=105. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Core][test][minor] replace try finally block ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5739#issuecomment-97635489 Hi @vanzin, Sorry for my misleading expression. I mean, the exception threw from `try` block is not aware in `finally` block even if it is caught by a `catch` block, and if the `finally` block also cause another exception(which is not the root cause of error, just side affection of previous exception in `try` block), this exception will not be suppressed, and the outcome error message will mixed with both exceptions thrown from `try` block and `finally` block, I think that may be a little confusing to user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Core][test][minor] replace try finally block ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5739#issuecomment-97287644 Hi @vanzin , the way you suggested may not able to get exception threw from `try` block, so that some sibling exceptions may not able to be suppressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6314][CORE] handle JsonParseException f...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5736#issuecomment-97282994 Thanks to @andrewor14 and @vanzin's comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Core][test][minor] replace try finally block ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5739#issuecomment-96964795 @rxin, I'm just thinking it's always more reasonable to use `tryWithSafeFinally`, which can give more elegant error message. Also, all the code in other places use `tryWithSafeFinally` for writing to output together with close the output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Core][test][minor] replace try finally block ...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/5739 [Core][test][minor] replace try finally block with tryWithSafeFinally You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark trySafeFinally Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5739.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5739 commit 55683e57e6c532a7c4f6bbb94f5efb8c57d11670 Author: Zhang, Liye Date: 2015-04-28T06:17:38Z replace try finally block with tryWithSafeFinally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6314][CORE] handle JsonParseException f...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/5736 [SPARK-6314][CORE] handle JsonParseException for history server This is handled in the same way with [SPARK-6197](https://issues.apache.org/jira/browse/SPARK-6197). The result of this PR is that exception showed in history server log will be replaced by a warning, and the application that with un-complete history log file will be listed on history server webUI You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark SPARK-6314 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5736.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5736 commit b8d2d885f6527cbdb3377cc2e8296f612c01d596 Author: Zhang, Liye Date: 2015-04-28T06:02:07Z handle JsonParseException for history server --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/2134#issuecomment-93649867 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6676][BUILD] add more hadoop version su...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/5331#issuecomment-6497 Ok, I'll close this PR then, thanks for your feedback, @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6676][BUILD] add more hadoop version su...
Github user liyezhang556520 closed the pull request at: https://github.com/apache/spark/pull/5331 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6676][BUILD] add more hadoop version su...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/5331 [SPARK-6676][BUILD] add more hadoop version support for maven profile support `-Phadoop-2.5` and `-Phadoop-2.6` when building and testing Spark You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark moreHadoopVersion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5331 commit a3157858d97aa0e8260c1fbf8fe362fb20fdd16c Author: Zhang, Liye Date: 2015-04-02T09:45:47Z add more hadoop version support for maven profile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6197][CORE] handle json exception when ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4927#issuecomment-77600734 @viirya , that is because we need to check whether `lines` iterator is exhausted (by checking `lines.hasNext`) to decide whether we can ignore the exception, and the `lines` is only available in the try block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6197][CORE] handle json exception when ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4927#issuecomment-77576837 @srowen , thanks for your comments, and I have updated the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6197][CORE] handle json exception when ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4927#issuecomment-77511920 ok, thanks @zsxwing answer the second question. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6197][CORE] handle json exception when ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4927#issuecomment-77511789 And I don't think writing operation is atomic, signals still can iterrupt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6197][CORE] handle json exception when ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4927#issuecomment-77511029 Yes, I do encounter the problem, if you use Ctrl+C during tasks finishing, it's easy to happen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6197][CORE] handle json exception when ...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4927#issuecomment-77509856 cc @srowen, @andrewor14, @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6197][CORE] handle json exception when ...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/4927 [SPARK-6197][CORE] handle json exception when hisotry file not finished writing For details, please refer to [SPARK-6197](https://issues.apache.org/jira/browse/SPARK-6197) You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark jsonParseError Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4927 commit 2973024239f4b0b506cb334828e3fd10668b23b5 Author: Zhang, Liye Date: 2015-03-06T05:21:06Z handle json exception when file not finished writing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE, DEPLOY][minor] align arguments order wi...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/4924 [CORE, DEPLOY][minor] align arguments order with docs of worker The help message for starting `worker` is `Usage: Worker [options] `. While in `start-slaves.sh`, the format is not align with that, it is confusing for the fist glance. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark startSlaves Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4924.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4924 commit 7fd5deb91deee30cd7f8695a6197a6b3cdad40fa Author: Zhang, Liye Date: 2015-03-06T02:09:01Z align arguments order with docs of worker --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6159][Core] Distinguish between inprogr...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/4891#discussion_r25864977 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -202,13 +202,23 @@ private[spark] class EventLoggingListener( } } fileSystem.rename(new Path(logPath + IN_PROGRESS), target) +writer = None } + Runtime.getRuntime.addShutdownHook(new Thread("Renaming inprogress log to logPath") { +override def run(): Unit = Utils.logUncaughtExceptions { + if (writer != None && fileSystem.exists(new Path(logPath + IN_PROGRESS))) { +logDebug("Inprogressing event log exists. Application may be terminated abnormally.") +fileSystem.rename(new Path(logPath + IN_PROGRESS), new Path(logPath + ABNORMAL)) --- End diff -- @viirya , I encountered exception with this `rename` operation on, and no exception without this line, did you meet this? I haven't figure out why. ```java.lang.IllegalArgumentException: Codec [abnormal] is not available. Consider setting spark.io.compression.codec=lzf at org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(CompressionCodec.scala:73) at org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(CompressionCodec.scala:73) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:73) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$8$$anonfun$apply$2.apply(EventLoggingListener.scala:287) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$8$$anonfun$apply$2.apply(EventLoggingListener.scala:287) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$8.apply(EventLoggingListener.scala:287) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$8.apply(EventLoggingListener.scala:286) at scala.Option.map(Option.scala:145) at org.apache.spark.scheduler.EventLoggingListener$.openEventLog(EventLoggingListener.scala:286) at org.apache.spark.deploy.master.Master.rebuildSparkUI(Master.scala:773) at org.apache.spark.deploy.master.Master.removeApplication(Master.scala:710) at org.apache.spark.deploy.master.Master.finishApplication(Master.scala:688)``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6159][Core] Distinguish between inprogr...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4891#issuecomment-77366985 Hi @viirya , @andrewor14 , I don't think rename `.inprogress` to normal log file is a good idea. If the log file remains with `.inprogress`, there must be some problem. In this way we can distinguish the abnormal cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6107][CORE] Display inprogress applicat...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/4848#discussion_r25671520 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -774,7 +778,7 @@ private[spark] class Master( case fnf: FileNotFoundException => // Event logging is enabled for this application, but no event logs are found val title = s"Application history not found (${app.id})" -var msg = s"No event logs found for application $appName in ${app.desc.eventLogDir}." +var msg = s"No event logs found for application $appName in ${app.desc.eventLogDir.get}." --- End diff -- Yes, it's an unrelated change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6107][CORE] Display inprogress applicat...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4848#issuecomment-76887024 Hi @srowen , @viirya , any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6107][CORE] Display inprogress applicat...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/4848#issuecomment-76717703 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6107][CORE] Display inprogress applicat...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/4848#discussion_r25593705 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -736,30 +736,34 @@ private[spark] class Master( val appName = app.desc.name val notFoundBasePath = HistoryServer.UI_PATH_PREFIX + "/not-found" try { - val eventLogFile = app.desc.eventLogDir -.map { dir => EventLoggingListener.getLogPath(dir, app.id) } + val eventLogDir = app.desc.eventLogDir .getOrElse { // Event logging is not enabled for this application app.desc.appUiUrl = notFoundBasePath return false } - - val fs = Utils.getHadoopFileSystem(eventLogFile, hadoopConf) - - if (fs.exists(new Path(eventLogFile + EventLoggingListener.IN_PROGRESS))) { + + val eventLogFilePrefix = EventLoggingListener.getLogPath(eventLogDir, app.id) + val fs = Utils.getHadoopFileSystem(eventLogDir, hadoopConf) + val inProgressExists = fs.exists(new Path(eventLogFilePrefix + + EventLoggingListener.IN_PROGRESS)) + + if (inProgressExists) { // Event logging is enabled for this application, but the application is still in progress -val title = s"Application history not found (${app.id})" -var msg = s"Application $appName is still in progress." -logWarning(msg) -msg = URLEncoder.encode(msg, "UTF-8") -app.desc.appUiUrl = notFoundBasePath + s"?msg=$msg&title=$title" -return false +logWarning(s"Application $appName is still in progress, it may be terminated accidently.") } - + + val eventLogFile = if (inProgressExists) { +eventLogFilePrefix + EventLoggingListener.IN_PROGRESS + } else { +eventLogFilePrefix + } + val status = if (inProgressExists) " (inprogress)" else " (completed)" --- End diff -- Good idea, thans @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6107][CORE] Display inprogress applicat...
Github user liyezhang556520 commented on a diff in the pull request: https://github.com/apache/spark/pull/4848#discussion_r25591973 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -736,30 +736,34 @@ private[spark] class Master( val appName = app.desc.name val notFoundBasePath = HistoryServer.UI_PATH_PREFIX + "/not-found" try { - val eventLogFile = app.desc.eventLogDir -.map { dir => EventLoggingListener.getLogPath(dir, app.id) } + val eventLogDir = app.desc.eventLogDir .getOrElse { // Event logging is not enabled for this application app.desc.appUiUrl = notFoundBasePath return false } - - val fs = Utils.getHadoopFileSystem(eventLogFile, hadoopConf) - - if (fs.exists(new Path(eventLogFile + EventLoggingListener.IN_PROGRESS))) { + + val eventLogFilePrefix = EventLoggingListener.getLogPath(eventLogDir, app.id) + val fs = Utils.getHadoopFileSystem(eventLogDir, hadoopConf) + val inProgressExists = fs.exists(new Path(eventLogFilePrefix + + EventLoggingListener.IN_PROGRESS)) + + if (inProgressExists) { // Event logging is enabled for this application, but the application is still in progress -val title = s"Application history not found (${app.id})" -var msg = s"Application $appName is still in progress." +var msg = s"Application $appName is still in progress, it may be terminated accidently." logWarning(msg) -msg = URLEncoder.encode(msg, "UTF-8") -app.desc.appUiUrl = notFoundBasePath + s"?msg=$msg&title=$title" -return false } + + val eventLogFile = eventLogFilePrefix + { +if(inProgressExists) EventLoggingListener.IN_PROGRESS --- End diff -- Yes, you are right, I discovered that, i'll update it soon, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org