[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76864451 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28201/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76864438 [Test build #28201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28201/consoleFull) for PR 4821 at commit [`8511141`](https://github.com/apache/spark/commit/8511141f6b769655acc2b0c1de33a8401c8f7133). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4821 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76860619 [Test build #28198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28198/consoleFull) for PR 4821 at commit [`fdae14c`](https://github.com/apache/spark/commit/fdae14c86f660047b50020b40a782b842182336c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76860628 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28198/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76855580 Okay I took a close look through this and it LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76850938 [Test build #28201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28201/consoleFull) for PR 4821 at commit [`8511141`](https://github.com/apache/spark/commit/8511141f6b769655acc2b0c1de33a8401c8f7133). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25649521 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -574,6 +583,11 @@ private[spark] object JsonProtocol { SparkListenerExecutorRemoved(time, executorId, reason) } + def logStartFromJson(json: JValue): SparkListenerLogStart = { +val version = (json \ "Spark Version").extract[String] --- End diff -- (edit): misunderstood what you meant. I'll rename the variable NBD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76848440 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25648636 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -574,6 +583,11 @@ private[spark] object JsonProtocol { SparkListenerExecutorRemoved(time, executorId, reason) } + def logStartFromJson(json: JValue): SparkListenerLogStart = { +val version = (json \ "Spark Version").extract[String] --- End diff -- but then it's not consistent with other similar JSONs... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25648384 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -574,6 +583,11 @@ private[spark] object JsonProtocol { SparkListenerExecutorRemoved(time, executorId, reason) } + def logStartFromJson(json: JValue): SparkListenerLogStart = { +val version = (json \ "Spark Version").extract[String] --- End diff -- minor - but maybe call this `sparkVersion` so it's clear that this isn't a version for the logging format (since that doesn't have its own versioning). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76846594 [Test build #28198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28198/consoleFull) for PR 4821 at commit [`fdae14c`](https://github.com/apache/spark/commit/fdae14c86f660047b50020b40a782b842182336c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25647830 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -89,16 +105,18 @@ class FsHistoryProviderSuite extends FunSuite with BeforeAndAfter with Matchers val list = provider.getListing().toSeq list should not be (null) -list.size should be (4) -list.count(e => e.completed) should be (2) +list.size should be (5) +list.count(_.completed) should be (3) list(0) should be (ApplicationHistoryInfo(newAppComplete.getName(), "new-app-complete", 1L, 4L, newAppComplete.lastModified(), "test", true)) -list(1) should be (ApplicationHistoryInfo(oldAppComplete.getName(), "old-app-complete", 2L, 3L, +list(1) should be (ApplicationHistoryInfo(newAppCompressedComplete.getName(), --- End diff -- It could be that you're using a different filesystem than then jenkins machine, and both return things in different order. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25647694 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -89,16 +105,18 @@ class FsHistoryProviderSuite extends FunSuite with BeforeAndAfter with Matchers val list = provider.getListing().toSeq list should not be (null) -list.size should be (4) -list.count(e => e.completed) should be (2) +list.size should be (5) +list.count(_.completed) should be (3) list(0) should be (ApplicationHistoryInfo(newAppComplete.getName(), "new-app-complete", 1L, 4L, newAppComplete.lastModified(), "test", true)) -list(1) should be (ApplicationHistoryInfo(oldAppComplete.getName(), "old-app-complete", 2L, 3L, +list(1) should be (ApplicationHistoryInfo(newAppCompressedComplete.getName(), --- End diff -- yeah, I just realized this independently. I fixed this in my latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25647732 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -89,16 +105,18 @@ class FsHistoryProviderSuite extends FunSuite with BeforeAndAfter with Matchers val list = provider.getListing().toSeq list should not be (null) -list.size should be (4) -list.count(e => e.completed) should be (2) +list.size should be (5) +list.count(_.completed) should be (3) list(0) should be (ApplicationHistoryInfo(newAppComplete.getName(), "new-app-complete", 1L, 4L, newAppComplete.lastModified(), "test", true)) -list(1) should be (ApplicationHistoryInfo(oldAppComplete.getName(), "old-app-complete", 2L, 3L, +list(1) should be (ApplicationHistoryInfo(newAppCompressedComplete.getName(), --- End diff -- however that still doesn't explain why it passes locally but fails remotely. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25641491 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -89,16 +105,18 @@ class FsHistoryProviderSuite extends FunSuite with BeforeAndAfter with Matchers val list = provider.getListing().toSeq list should not be (null) -list.size should be (4) -list.count(e => e.completed) should be (2) +list.size should be (5) +list.count(_.completed) should be (3) list(0) should be (ApplicationHistoryInfo(newAppComplete.getName(), "new-app-complete", 1L, 4L, newAppComplete.lastModified(), "test", true)) -list(1) should be (ApplicationHistoryInfo(oldAppComplete.getName(), "old-app-complete", 2L, 3L, +list(1) should be (ApplicationHistoryInfo(newAppCompressedComplete.getName(), --- End diff -- I think your problem is that entires `1` and `2` have both the same start and end time, so their sort order is non-deterministic. Also, `newAppCompressedComplete` has end time `4` in the code a few lines above. So it seems your test code has a couple of issues, this is not really flakiness. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76831455 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28182/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76831439 [Test build #28182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28182/consoleFull) for PR 4821 at commit [`654883d`](https://github.com/apache/spark/commit/654883dfbd65de162455601c69108dab4e354f7d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76821354 > flakiness of FsHistoryProviderSuite Do you have any theories for why this would be related to the log's modification time? The only place where it's used in FsHistoryProvider is for filtering entries in `checkForLogs`, and it's already a conservative check (it uses `>=` so that it picks up any new logs that show up with the same mod time as the last check, at the expense of re-parsing logs that haven't changed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76815718 [Test build #28182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28182/consoleFull) for PR 4821 at commit [`654883d`](https://github.com/apache/spark/commit/654883dfbd65de162455601c69108dab4e354f7d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76815711 By the way I gained first-hand experience of the flakiness of `FsHistoryProviderSuite` from this patch. The tests passed locally but failed here, and I suspect that it has something to do with the modification time not varying enough between the file writes, though this is something we should fix separately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25628566 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -71,6 +74,21 @@ private[spark] object CompressionCodec { s"Consider setting $configKey=$FALLBACK_COMPRESSION_CODEC")) } + /** + * Return the short version of the given codec name. + * If it is already a short name, just return it. + */ + def getShortName(codecName: String): String = { +if (shortCompressionCodecNames.contains(codecName)) { + codecName +} else { + shortCompressionCodecNames +.collect { case (k, v) if v == codecName => k } +.headOption --- End diff -- thanks, that's exactly what I want --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25628525 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -111,19 +119,17 @@ private[spark] class EventLoggingListener( hadoopDataStream.get } -val compressionCodec = - if (shouldCompress) { -Some(CompressionCodec.createCodec(sparkConf)) - } else { -None - } - -fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) -val logStream = initEventLog(new BufferedOutputStream(dstream, outputBufferSize), - compressionCodec) -writer = Some(new PrintWriter(logStream)) - -logInfo("Logging events to %s".format(logPath)) +try { + val cstream = compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream) + val bstream = new BufferedOutputStream(cstream, outputBufferSize) + fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) + writer = Some(new PrintWriter(bstream)) --- End diff -- again, I'm just trying to preserve functionality here. If it's a real issue I would prefer to fix it separately, but in this patch I'm trying to minimize the scope of the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76787929 [Test build #28180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28180/consoleFull) for PR 4821 at commit [`7f537cd`](https://github.com/apache/spark/commit/7f537cda6e68f922889f5b95e2869ec770f038a8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76785685 @pwendell yes, that's what I'm trying to say. Even today, logs generated by Spark 1.0, 1.1 and 1.2 (and soon 1.3) are not exactly the same - events have different properties. The metadata allows the parsing code to know what to expect. (That is not a problem internally in Spark because the code handles all added/changed event properties using optionals and things like that, but there's no way to know whether we'll make a change in the future that prevents code like that from working without knowing the version of the data.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25628232 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -111,19 +119,17 @@ private[spark] class EventLoggingListener( hadoopDataStream.get } -val compressionCodec = - if (shouldCompress) { -Some(CompressionCodec.createCodec(sparkConf)) - } else { -None - } - -fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) -val logStream = initEventLog(new BufferedOutputStream(dstream, outputBufferSize), - compressionCodec) -writer = Some(new PrintWriter(logStream)) - -logInfo("Logging events to %s".format(logPath)) +try { + val cstream = compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream) + val bstream = new BufferedOutputStream(cstream, outputBufferSize) + fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) + writer = Some(new PrintWriter(bstream)) --- End diff -- Ah, sorry, there doesn't seem to be one for `(OutputStream, String)`. Sigh. Do you need a PrintWriter here? Can you use a regular `Writer` instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76782633 @vanzin so just to be clear - you are anticipating a future case where we need to read the version to correctly parse the logs? Is that the argument for it? I am only making a simple observation that here we write something and never read it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25626361 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -45,18 +44,26 @@ class FsHistoryProviderSuite extends FunSuite with BeforeAndAfter with Matchers Utils.deleteRecursively(testDir) } + /** Create a fake log file using the new log format used in Spark 1.3+ */ + private def newLogFile(appId: String, inProgress: Boolean = false): File = { --- End diff -- Ok. I'm just preserving functionality here but it's not too expensive to add tests for that too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76780546 @pwendell the header is needed because it contains potentially useful information for the code parsing the logs. For example, now it contains the Spark version, which might be needed to tell the parsing code which properties to expect in the logs. The original version of the change (the one that got rid of the directories and used a single file) encoded all metadata in the file name. The feedback was that it was ugly (long, cryptic file names) and brittle, since if you change the file name, you lose that information. I agree with that and thus the header was born. Now we're back to encoding metadata in the file name. A simple extension is not to bad, though, espcially since you can probably figure out the compression codec by looking at the first few bytes of the file. But the header still provides useful information. So I'm a little worried that the latest patch removes the metadata completely. Especially since it's common for the first event of the log to *not* be the one that contains the spark version (`SparkListenerEnvironmentUpdate`?), and instead be `SparkListenerBlockManagerAdded`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25626186 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -111,19 +119,17 @@ private[spark] class EventLoggingListener( hadoopDataStream.get } -val compressionCodec = - if (shouldCompress) { -Some(CompressionCodec.createCodec(sparkConf)) - } else { -None - } - -fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) -val logStream = initEventLog(new BufferedOutputStream(dstream, outputBufferSize), - compressionCodec) -writer = Some(new PrintWriter(logStream)) - -logInfo("Logging events to %s".format(logPath)) +try { + val cstream = compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream) + val bstream = new BufferedOutputStream(cstream, outputBufferSize) + fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) + writer = Some(new PrintWriter(bstream)) --- End diff -- http://docs.oracle.com/javase/6/docs/api/java/io/PrintWriter.html#PrintWriter(java.lang.String,%20java.lang.String) Without that, you're writing data in the "default platform encoding". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25625447 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -45,18 +44,26 @@ class FsHistoryProviderSuite extends FunSuite with BeforeAndAfter with Matchers Utils.deleteRecursively(testDir) } + /** Create a fake log file using the new log format used in Spark 1.3+ */ + private def newLogFile(appId: String, inProgress: Boolean = false): File = { --- End diff -- Hmm... this method doesn't account for the compression codec. Maybe it should, and we should add a compressed, new-style event log to the "Parse new and old application logs" test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25625518 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -111,19 +119,17 @@ private[spark] class EventLoggingListener( hadoopDataStream.get } -val compressionCodec = - if (shouldCompress) { -Some(CompressionCodec.createCodec(sparkConf)) - } else { -None - } - -fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) -val logStream = initEventLog(new BufferedOutputStream(dstream, outputBufferSize), - compressionCodec) -writer = Some(new PrintWriter(logStream)) - -logInfo("Logging events to %s".format(logPath)) +try { + val cstream = compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream) + val bstream = new BufferedOutputStream(cstream, outputBufferSize) + fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) + writer = Some(new PrintWriter(bstream)) --- End diff -- that signature doesn't exist. Also this was moved from L124 so it's probably fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25625161 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -45,18 +44,26 @@ class FsHistoryProviderSuite extends FunSuite with BeforeAndAfter with Matchers Utils.deleteRecursively(testDir) } + /** Create a fake log file using the new log format used in Spark 1.3+ */ + private def newLogFile(appId: String, inProgress: Boolean = false): File = { +val ip = if (inProgress) EventLoggingListener.IN_PROGRESS else "" +val logUri = EventLoggingListener.getLogPath(testDir.getAbsolutePath, appId) +val logPath = new URI(logUri).getPath + ip +new File(logPath) + } + test("Parse new and old application logs") { val provider = new FsHistoryProvider(createTestConf()) // Write a new-style application log. -val newAppComplete = new File(testDir, "new1") +val newAppComplete = newLogFile("new1", inProgress = false) --- End diff -- If you're going to pass the default value explicitly, might be better to not make it an optional parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25625032 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -279,52 +253,17 @@ private[spark] object EventLoggingListener extends Logging { } val in = new BufferedInputStream(fs.open(log)) -// Read a single line from the input stream without buffering. -// We cannot use BufferedReader because we must avoid reading -// beyond the end of the header, after which the content of the -// file may be compressed. -def readLine(): String = { - val bytes = new ByteArrayOutputStream() - var next = in.read() - var count = 0 - while (next != '\n') { -if (next == -1) { - throw new IOException("Unexpected end of file.") -} -bytes.write(next) -count = count + 1 -if (count > MAX_HEADER_LINE_LENGTH) { - throw new IOException("Maximum header line length exceeded.") -} -next = in.read() - } - new String(bytes.toByteArray(), Charsets.UTF_8) + +// Compression codec is encoded as an extension, e.g. app_123.lzf +// Since we sanitize the app ID to not include periods, it is safe to split on it +val logName = log.getName.replaceAll(IN_PROGRESS, "") --- End diff -- `replaceAll` takes a regex, so you need to `Pattern.quote(IN_PROGRESS)`. In fact, it might be better to use `stripSuffix`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25624653 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -111,19 +119,17 @@ private[spark] class EventLoggingListener( hadoopDataStream.get } -val compressionCodec = - if (shouldCompress) { -Some(CompressionCodec.createCodec(sparkConf)) - } else { -None - } - -fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) -val logStream = initEventLog(new BufferedOutputStream(dstream, outputBufferSize), - compressionCodec) -writer = Some(new PrintWriter(logStream)) - -logInfo("Logging events to %s".format(logPath)) +try { + val cstream = compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream) + val bstream = new BufferedOutputStream(cstream, outputBufferSize) + fileSystem.setPermission(path, LOG_FILE_PERMISSIONS) + writer = Some(new PrintWriter(bstream)) --- End diff -- `new PrintWriter(bstream, "UTF-8")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25624492 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -71,6 +74,21 @@ private[spark] object CompressionCodec { s"Consider setting $configKey=$FALLBACK_COMPRESSION_CODEC")) } + /** + * Return the short version of the given codec name. + * If it is already a short name, just return it. + */ + def getShortName(codecName: String): String = { +if (shortCompressionCodecNames.contains(codecName)) { + codecName +} else { + shortCompressionCodecNames +.collect { case (k, v) if v == codecName => k } +.headOption --- End diff -- `collectFirst`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25624317 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -320,30 +320,24 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis * log file (along with other metadata files), which is the case for directories generated by * the code in previous releases. * - * @return 2-tuple of (input stream of the events, version of Spark which wrote the log) + * @return input stream that holds one JSON serialized event per line --- End diff -- nit: one JSON *record* --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76673355 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28159/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76673346 [Test build #28159 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28159/consoleFull) for PR 4821 at commit [`7d6aa61`](https://github.com/apache/spark/commit/7d6aa61c59a784d6f62fee94c9d7f4c6e0f501f9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76669704 [Test build #28158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28158/consoleFull) for PR 4821 at commit [`59abee9`](https://github.com/apache/spark/commit/59abee953d785f8d6b4f0320eb5c5e91715c35a3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MatrixFactorizationModel(JavaModelWrapper, Saveable, JavaLoader):` * `class Saveable(object):` * `class Loader(object):` * `class JavaLoader(Loader):` * `java_class = ".".join([java_package, cls.__name__])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76669712 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28158/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76668874 [Test build #28157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28157/consoleFull) for PR 4821 at commit [`40e1ff6`](https://github.com/apache/spark/commit/40e1ff6398be8a55fe834e7d9c705d4ea31a520a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76668879 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28157/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-7392 [Test build #28159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28159/consoleFull) for PR 4821 at commit [`7d6aa61`](https://github.com/apache/spark/commit/7d6aa61c59a784d6f62fee94c9d7f4c6e0f501f9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76663956 [Test build #28158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28158/consoleFull) for PR 4821 at commit [`59abee9`](https://github.com/apache/spark/commit/59abee953d785f8d6b4f0320eb5c5e91715c35a3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76663334 As of the latest commit I have removed the Spark version and the metadata. Please have a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25579983 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -39,10 +39,9 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { * error is thrown by this method. * * @param logData Stream containing event log data. - * @param version Spark version that generated the events. * @param sourceName Filename (or other source identifier) from whence @logData is being read */ - def replay(logData: InputStream, version: String, sourceName: String) { + def replay(logData: InputStream, sourceName: String): Unit = { --- End diff -- note that changing this signature was necessary, since the new format can no longer provide a Spark version that makes sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25579955 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -322,28 +322,22 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis * --- End diff -- Actually I guess this handles multiple versions. My bad, fine to leave as is --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25579899 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -322,28 +322,22 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis * * @return 2-tuple of (input stream of the events, version of Spark which wrote the log) --- End diff -- This is outdated now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25579888 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -322,28 +322,22 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis * --- End diff -- This doc might be better if it said "in Spark 1.2.X and earlier" instead of "in previous releases". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76663276 [Test build #28157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28157/consoleFull) for PR 4821 at commit [`40e1ff6`](https://github.com/apache/spark/commit/40e1ff6398be8a55fe834e7d9c705d4ea31a520a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25579333 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -23,7 +23,9 @@ private[spark] class ApplicationDescription( val memoryPerSlave: Int, val command: Command, var appUiUrl: String, -val eventLogDir: Option[String] = None) +val sparkVersion: String, --- End diff -- just adding an extra field seems like a solution that involves a smaller change, but let me know if you'd like me to change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25577300 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -23,7 +23,9 @@ private[spark] class ApplicationDescription( val memoryPerSlave: Int, val command: Command, var appUiUrl: String, -val eventLogDir: Option[String] = None) +val sparkVersion: String, --- End diff -- it's because at this point the `SparkContext` doesn't have an App ID yet. If we want to pass name of the file we'll need to have some place holder (e.g. `{{APP_ID}}_lzf`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25577271 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,57 +223,67 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * "key=value" format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. - * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap(("version" -> SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += ("Event" -> Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY -> SPARK_VERSION) compressionCodec.foreach { codec => - meta += ("compressionCodec" -> codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY -> codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length > MAX_HEADER_LINE_LENGTH) { -throw new IOException(s"Header entry too long: ${entry}") - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + "\n").getBytes(Charsets.UTF_8) +if (metadataBytes.length > MAX_HEADER_LINE_LENGTH) { + throw new IOException(s"Event log metadata too long: $metadataJson") } - -meta.foreach { case (k, v) => write(s"$k=$v\n") } -write(s"$HEADER_END_MARKER\n") - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase -Utils.resolveURI(logBaseDir) + "/" + name.stripSuffix("/") + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String] = None): String = { +val sanitizedAppId = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 --- End diff -- you
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76641955 I took a pass on this with some feedback. Overall, it would be good to really minimize the scope of the changes since this is so late in the game. There is some clean-up and renaming etc that would be best just left out of the patch. The main thing I'm wondering is why we need this header at all. It doesn't even ever get used by our own replay - we just ignore it. It seems like it was added for the purpose of conveying the compression codec to bootstrap replaying the file, however just having an extension seems like a better, much more standard way of doing that. The only argument I see for it is that the header could be used in the future to encode things that are necessary for proper replay of the logs. However, in that case I don't see why we can't just add it later if and when those things occur. I guess I don't see a good argument against a straw man of just not having the header. Curious to hear thoughts from @andrewor14 and @vanzin. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25574126 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -99,6 +99,12 @@ case class SparkListenerExecutorRemoved(time: Long, executorId: String, reason: extends SparkListenerEvent /** + * A special dummy event used to identify the metadata header in event logs. + * This is not actually posted anywhere. + */ +private[spark] case object SparkListenerMetadataIdentifier extends SparkListenerEvent --- End diff -- Since this is never used, why don't we instead just ignore this event type when reading from JSON (i.e. allow sparkEventFromJson to return None). It seems a bit strange to add this event but it contains nothing that's ever used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25573931 --- Diff: core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala --- @@ -23,7 +23,9 @@ private[spark] class ApplicationDescription( val memoryPerSlave: Int, val command: Command, var appUiUrl: String, -val eventLogDir: Option[String] = None) +val sparkVersion: String, --- End diff -- What about just passing a single option `eventLogFile` here (i.e. the full path to the log file). The master only uses the evenLogDir, sparkVersion, eventLogCodec to reconstruct the filename. Reconstructing it in two places seems a little brittle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25573801 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,57 +223,67 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * "key=value" format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. - * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap(("version" -> SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += ("Event" -> Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY -> SPARK_VERSION) compressionCodec.foreach { codec => - meta += ("compressionCodec" -> codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY -> codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length > MAX_HEADER_LINE_LENGTH) { -throw new IOException(s"Header entry too long: ${entry}") - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + "\n").getBytes(Charsets.UTF_8) +if (metadataBytes.length > MAX_HEADER_LINE_LENGTH) { + throw new IOException(s"Event log metadata too long: $metadataJson") } - -meta.foreach { case (k, v) => write(s"$k=$v\n") } -write(s"$HEADER_END_MARKER\n") - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase -Utils.resolveURI(logBaseDir) + "/" + name.stripSuffix("/") + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String] = None): String = { +val sanitizedAppId = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 +// e.g. EVENT_LOG_ {...}
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25573776 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,57 +223,67 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * "key=value" format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. - * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap(("version" -> SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += ("Event" -> Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY -> SPARK_VERSION) compressionCodec.foreach { codec => - meta += ("compressionCodec" -> codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY -> codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length > MAX_HEADER_LINE_LENGTH) { -throw new IOException(s"Header entry too long: ${entry}") - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + "\n").getBytes(Charsets.UTF_8) +if (metadataBytes.length > MAX_HEADER_LINE_LENGTH) { + throw new IOException(s"Event log metadata too long: $metadataJson") } - -meta.foreach { case (k, v) => write(s"$k=$v\n") } -write(s"$HEADER_END_MARKER\n") - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase -Utils.resolveURI(logBaseDir) + "/" + name.stripSuffix("/") + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String] = None): String = { +val sanitizedAppId = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 --- End diff -- I don
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76507612 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28103/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76507606 [Test build #28103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28103/consoleFull) for PR 4821 at commit [`519e51a`](https://github.com/apache/spark/commit/519e51a958b40d193327e85b659e1df767041f55). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76505881 [Test build #28102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28102/consoleFull) for PR 4821 at commit [`ef69276`](https://github.com/apache/spark/commit/ef692768db319d3159ce9522d625cede3505e161). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76505885 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28102/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76503962 [Test build #28103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28103/consoleFull) for PR 4821 at commit [`519e51a`](https://github.com/apache/spark/commit/519e51a958b40d193327e85b659e1df767041f55). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25552458 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -110,17 +117,12 @@ private[spark] class EventLoggingListener( hadoopDataStream = Some(fileSystem.create(path)) hadoopDataStream.get } - -val compressionCodec = - if (shouldCompress) { -Some(CompressionCodec.createCodec(sparkConf)) - } else { -None - } +val cstream = compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream) --- End diff -- that's fine. I fixed this in my latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25552366 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,53 +219,60 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * "key=value" format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). - * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap(("version" -> SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += ("Event" -> Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY -> SPARK_VERSION) compressionCodec.foreach { codec => - meta += ("compressionCodec" -> codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY -> codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length > MAX_HEADER_LINE_LENGTH) { -throw new IOException(s"Header entry too long: ${entry}") - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + "\n").getBytes(Charsets.UTF_8) +if (metadataBytes.length > MAX_HEADER_LINE_LENGTH) { + throw new IOException(s"Event log metadata too long: $metadataJson") } - -meta.foreach { case (k, v) => write(s"$k=$v\n") } -write(s"$HEADER_END_MARKER\n") - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase -Utils.resolveURI(logBaseDir) + "/" + name.stripSuffix("/") + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String]): String = { +val sanitizedAppId = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 +// e.g. EVENT_LOG_ {...} _COM
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76502888 [Test build #28102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28102/consoleFull) for PR 4821 at commit [`ef69276`](https://github.com/apache/spark/commit/ef692768db319d3159ce9522d625cede3505e161). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25552104 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,53 +219,60 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * "key=value" format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). - * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap(("version" -> SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += ("Event" -> Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY -> SPARK_VERSION) compressionCodec.foreach { codec => - meta += ("compressionCodec" -> codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY -> codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length > MAX_HEADER_LINE_LENGTH) { -throw new IOException(s"Header entry too long: ${entry}") - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + "\n").getBytes(Charsets.UTF_8) +if (metadataBytes.length > MAX_HEADER_LINE_LENGTH) { + throw new IOException(s"Event log metadata too long: $metadataJson") } - -meta.foreach { case (k, v) => write(s"$k=$v\n") } -write(s"$HEADER_END_MARKER\n") - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase -Utils.resolveURI(logBaseDir) + "/" + name.stripSuffix("/") + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String]): String = { +val sanitizedAppId = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 +// e.g. EVENT_LOG_ {...} _COMPRES
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25552069 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -202,11 +204,11 @@ private[spark] object EventLoggingListener extends Logging { val IN_PROGRESS = ".inprogress" val DEFAULT_LOG_DIR = "/tmp/spark-events" - private val LOG_FILE_PERMISSIONS = new FsPermission(Integer.parseInt("770", 8).toShort) + val EVENT_LOG_KEY = "EVENT_LOG" --- End diff -- that's fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25552048 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,53 +219,60 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * "key=value" format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). - * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap(("version" -> SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += ("Event" -> Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY -> SPARK_VERSION) compressionCodec.foreach { codec => - meta += ("compressionCodec" -> codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY -> codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length > MAX_HEADER_LINE_LENGTH) { -throw new IOException(s"Header entry too long: ${entry}") - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + "\n").getBytes(Charsets.UTF_8) +if (metadataBytes.length > MAX_HEADER_LINE_LENGTH) { + throw new IOException(s"Event log metadata too long: $metadataJson") } - -meta.foreach { case (k, v) => write(s"$k=$v\n") } -write(s"$HEADER_END_MARKER\n") - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase -Utils.resolveURI(logBaseDir) + "/" + name.stripSuffix("/") + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String]): String = { +val sanitizedAppId = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 +// e.g. EVENT_LOG_ {...} _COM
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76497543 [Test build #28098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28098/consoleFull) for PR 4821 at commit [`8db5a06`](https://github.com/apache/spark/commit/8db5a06d108d8a2ddb8460e48e3509f46cc4fc2f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76497550 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28098/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76496914 Looks reasonable, will wait for tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25549402 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -217,53 +219,60 @@ private[spark] object EventLoggingListener extends Logging { /** * Write metadata about the event log to the given stream. * - * The header is a serialized version of a map, except it does not use Java serialization to - * avoid incompatibilities between different JDKs. It writes one map entry per line, in - * "key=value" format. - * - * The very last entry in the header is the `HEADER_END_MARKER` marker, so that the parsing code - * can know when to stop. + * The header is a single line of JSON in the beginning of the file. Note that this + * assumes all metadata necessary to parse the log is also included in the file name. + * The format needs to be kept in sync with the `openEventLog()` method below. Also, it + * cannot change in new Spark versions without some other way of detecting the change. * - * The format needs to be kept in sync with the openEventLog() method below. Also, it cannot - * change in new Spark versions without some other way of detecting the change (like some - * metadata encoded in the file name). - * - * @param logStream Raw output stream to the even log file. + * @param logStream Raw output stream to the event log file. * @param compressionCodec Optional compression codec to use. - * @return A stream where to write event log data. This may be a wrapper around the original + * @return A stream to which event log data is written. This may be a wrapper around the original * stream (for example, when compression is enabled). */ def initEventLog( logStream: OutputStream, compressionCodec: Option[CompressionCodec]): OutputStream = { -val meta = mutable.HashMap(("version" -> SPARK_VERSION)) +val metadata = new mutable.HashMap[String, String] +// Some of these metadata are already encoded in the file name +// Here we include them again within the file itself for completeness +metadata += ("Event" -> Utils.getFormattedClassName(SparkListenerMetadataIdentifier)) +metadata += (SPARK_VERSION_KEY -> SPARK_VERSION) compressionCodec.foreach { codec => - meta += ("compressionCodec" -> codec.getClass().getName()) + metadata += (COMPRESSION_CODEC_KEY -> codec.getClass.getCanonicalName) } - -def write(entry: String) = { - val bytes = entry.getBytes(Charsets.UTF_8) - if (bytes.length > MAX_HEADER_LINE_LENGTH) { -throw new IOException(s"Header entry too long: ${entry}") - } - logStream.write(bytes, 0, bytes.length) +val metadataJson = compact(render(JsonProtocol.mapToJson(metadata))) +val metadataBytes = (metadataJson + "\n").getBytes(Charsets.UTF_8) +if (metadataBytes.length > MAX_HEADER_LINE_LENGTH) { + throw new IOException(s"Event log metadata too long: $metadataJson") } - -meta.foreach { case (k, v) => write(s"$k=$v\n") } -write(s"$HEADER_END_MARKER\n") - compressionCodec.map(_.compressedOutputStream(logStream)).getOrElse(logStream) +logStream.write(metadataBytes, 0, metadataBytes.length) +logStream } /** * Return a file-system-safe path to the log file for the given application. * + * Note that because we currently only create a single log file for each application, + * we must encode all the information needed to parse this event log in the file name + * instead of within the file itself. Otherwise, if the file is compressed, for instance, + * we won't know which codec to use to decompress the metadata. + * * @param logBaseDir Directory where the log file will be written. * @param appId A unique app ID. + * @param compressionCodecName Name of the compression codec used to compress the contents + * of the log, or None if compression is not enabled. * @return A path which consists of file-system-safe characters. */ - def getLogPath(logBaseDir: String, appId: String): String = { -val name = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase -Utils.resolveURI(logBaseDir) + "/" + name.stripSuffix("/") + def getLogPath( + logBaseDir: String, + appId: String, + compressionCodecName: Option[String]): String = { +val sanitizedAppId = appId.replaceAll("[ :/]", "-").replaceAll("[${}'\"]", "_").toLowerCase +// e.g. EVENT_LOG_app_123_SPARK_VERSION_1.3.1 +// e.g. EVENT_LOG_ {...} _COMPRES
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25549234 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -202,11 +204,11 @@ private[spark] object EventLoggingListener extends Logging { val IN_PROGRESS = ".inprogress" val DEFAULT_LOG_DIR = "/tmp/spark-events" - private val LOG_FILE_PERMISSIONS = new FsPermission(Integer.parseInt("770", 8).toShort) + val EVENT_LOG_KEY = "EVENT_LOG" --- End diff -- I think the file name would be less of a mouthfull without this prefix. But not a big deal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4821#discussion_r25549140 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -110,17 +117,12 @@ private[spark] class EventLoggingListener( hadoopDataStream = Some(fileSystem.create(path)) hadoopDataStream.get } - -val compressionCodec = - if (shouldCompress) { -Some(CompressionCodec.createCodec(sparkConf)) - } else { -None - } +val cstream = compressionCodec.map(_.compressedOutputStream(dstream)).getOrElse(dstream) --- End diff -- I know this is not part of this change, but if feels like this whole method needs a try..finally to avoid leaking an open `dstream` when some exception is thrown. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76494032 Tests will fail to compile. I'm fixing this in the mean time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6066] Make event log format easier to p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4821#issuecomment-76493745 [Test build #28098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28098/consoleFull) for PR 4821 at commit [`8db5a06`](https://github.com/apache/spark/commit/8db5a06d108d8a2ddb8460e48e3509f46cc4fc2f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org