[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19118 @jiangxb1987 well, the test case is hard to construct if we just run app in local like comments above. Any ideas to crack? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19118 @jiangxb1987 well, I passed that part above but met other initialization chances before runJob. They are in the write function of SparkHadoopWriter. > // Assert the output format/key/value class is set in JobConf. config.assertConf(jobContext, rdd.conf) <= chance val committer = config.createCommitter(stageId) committer.setupJob(jobContext) <= chance // Try to write all RDD partitions as a Hadoop OutputFormat. try { val ret = sparkContext.runJob(rdd, (context: TaskContext, iter: Iterator[(K, V)]) => { executeTask( context = context, config = config, jobTrackerId = jobTrackerId, sparkStageId = context.stageId, sparkPartitionId = context.partitionId, sparkAttemptNumber = context.attemptNumber, committer = committer, iterator = iter) }) One trace list: > java.lang.Thread.State: RUNNABLE at org.apache.hadoop.fs.FileSystem.getStatistics(FileSystem.java:3270) - locked <0x126a> (a java.lang.Class) at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:202) at org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:92) at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2598) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:91) at org.apache.hadoop.mapred.FileOutputCommitter.getWrapped(FileOutputCommitter.java:65) at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131) at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:233) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:125) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:74) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19118: [SPARK-21882][CORE] OutputMetrics doesn't count w...
Github user awarrior commented on a diff in the pull request: https://github.com/apache/spark/pull/19118#discussion_r138263099 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -112,11 +112,12 @@ object SparkHadoopWriter extends Logging { jobTrackerId, sparkStageId, sparkPartitionId, sparkAttemptNumber) committer.setupTask(taskContext) -val (outputMetrics, callback) = initHadoopOutputMetrics(context) - // Initiate the writer. config.initWriter(taskContext, sparkPartitionId) var recordsWritten = 0L + +// Initialize callback function after the writer. --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19118 I met a trouble when I write a test case. It seems that this issue won't be triggered in only one node. I found that Driver node do createPathFromString so that there is no problem. > java.lang.Thread.State: RUNNABLE at org.apache.hadoop.fs.FileSystem.getStatistics(FileSystem.java:3271) - locked <0x1211> (a java.lang.Class) at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:202) at org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:92) at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2598) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.spark.internal.io.SparkHadoopWriterUtils$.createPathFromString(SparkHadoopWriterUtils.scala:55) Does anyone know how to test in this case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19118 @jiangxb1987 ok. I add one later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19115 @markhamstra sorry to make trouble, I have opened a new PR #19118. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19118: [SPARK-21882][CORE] OutputMetrics doesn't count w...
GitHub user awarrior opened a pull request: https://github.com/apache/spark/pull/19118 [SPARK-21882][CORE] OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function spark-21882 ## What changes were proposed in this pull request? Switch the initialization order of HadoopOutputMetrics and SparkHadoopWriter ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/awarrior/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19118 commit 0f0c3b1c91b4f06c7e48874b8f6329c5c1c1b3ce Author: Jarvis Date: 2017-09-04T06:21:13Z Update SparkHadoopWriter.scala spark-21882 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19115 ok, thx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19115: [SPARK-21882][CORE] OutputMetrics doesn't count w...
Github user awarrior closed the pull request at: https://github.com/apache/spark/pull/19115 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19115 @jerryshao hi~ I have modified this PR. But this patch just work in 2.2.0 (some changes apply now). I want to confirm whether I need to create a new PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19114: Update PairRDDFunctions.scala
Github user awarrior closed the pull request at: https://github.com/apache/spark/pull/19114 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19115: Update PairRDDFunctions.scala
GitHub user awarrior opened a pull request: https://github.com/apache/spark/pull/19115 Update PairRDDFunctions.scala [https://issues.apache.org/jira/browse/SPARK-21882](url) You can merge this pull request into a Git repository by running: $ git pull https://github.com/awarrior/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19115.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19115 commit a096970b2f2cfa497a96870ebd26f83a106b4e07 Author: Jarvis Date: 2017-09-04T02:48:35Z Update PairRDDFunctions.scala spark-21882 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19114: Update PairRDDFunctions.scala
GitHub user awarrior opened a pull request: https://github.com/apache/spark/pull/19114 Update PairRDDFunctions.scala [https://issues.apache.org/jira/browse/SPARK-21882](url) You can merge this pull request into a Git repository by running: $ git pull https://github.com/awarrior/spark branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19114.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19114 commit e7e42802b07c5148ba02761af1edd2ee81d6ef95 Author: Jarvis Date: 2017-09-04T02:52:01Z Update PairRDDFunctions.scala spark-21882 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org