[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6833 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113870538 LGTM, thanks for fixing this! Merging to master and branch-1.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113870521 @andrewor14 They are not the same. #6864 affects dynamic partitioning feature of external data sources, while this one is about dynamic partitions of Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113351488 i think this pr is for hiveQL and #6864 is for common SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113312380 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113312346 [Test build #35171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35171/console) for PR 6833 at commit [`64bbfab`](https://github.com/apache/spark/commit/64bbfab33d748cce3cb1dbad55a86c3991d99899). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113289891 [Test build #35171 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35171/consoleFull) for PR 6833 at commit [`64bbfab`](https://github.com/apache/spark/commit/64bbfab33d748cce3cb1dbad55a86c3991d99899). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113289226 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113289197 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-113287867 ok to test. Is this issue the same as the one reported in #6864? @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112946466 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112946390 [Test build #35053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35053/console) for PR 6833 at commit [`64bbfab`](https://github.com/apache/spark/commit/64bbfab33d748cce3cb1dbad55a86c3991d99899). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112914955 [Test build #35053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35053/consoleFull) for PR 6833 at commit [`64bbfab`](https://github.com/apache/spark/commit/64bbfab33d748cce3cb1dbad55a86c3991d99899). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112914749 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112914725 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112914495 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user jeanlyn commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112636409 @chenghao-intel ,I think it only affect the dynamic partition.Because `SparkHadoopWriter` get the write by `OutputFormat.getRecordWriter`,most of them use the `FileOutputFormat.getTaskOutputPath` to get the path --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user jeanlyn commented on a diff in the pull request: https://github.com/apache/spark/pull/6833#discussion_r32592438 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala --- @@ -230,7 +230,15 @@ private[spark] class SparkHiveDynamicPartitionWriterContainer( val path = { val outputPath = FileOutputFormat.getOutputPath(conf.value) --- End diff -- Oh,I try it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112627047 also met this issue when dynamic partition in HiveContext --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112620496 Seems only affect the dynamic partition in HiveContext, @jeanlyn can you confirm that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/6833#discussion_r32589270 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala --- @@ -230,7 +230,15 @@ private[spark] class SparkHiveDynamicPartitionWriterContainer( val path = { val outputPath = FileOutputFormat.getOutputPath(conf.value) --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/6833#discussion_r32523129 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala --- @@ -230,7 +230,15 @@ private[spark] class SparkHiveDynamicPartitionWriterContainer( val path = { val outputPath = FileOutputFormat.getOutputPath(conf.value) --- End diff -- i think we just need to replace FileOutputFormat.getOutputPath with FileOutputFormat.getTaskOutputPath. because FileOutputFormat.getTaskOutputPath will return $outputPath/_temporary/${attemptId}/. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/6833#discussion_r32516707 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -197,7 +197,6 @@ case class InsertIntoHiveTable( table.hiveQlTable.getPartCols().foreach { entry => orderedPartitionSpec.put(entry.getName, partitionSpec.get(entry.getName).getOrElse("")) } - val partVals = MetaStoreUtils.getPvals(table.hiveQlTable.getPartCols, partitionSpec) --- End diff -- yes, i think you are right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user jeanlyn commented on a diff in the pull request: https://github.com/apache/spark/pull/6833#discussion_r32492419 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -197,7 +197,6 @@ case class InsertIntoHiveTable( table.hiveQlTable.getPartCols().foreach { entry => orderedPartitionSpec.put(entry.getName, partitionSpec.get(entry.getName).getOrElse("")) } - val partVals = MetaStoreUtils.getPvals(table.hiveQlTable.getPartCols, partitionSpec) --- End diff -- I think https://github.com/apache/spark/pull/5876/files#diff-d579db9a8f27e0bbef37720ab14ec3f6L203 should remove this code. @marmbrus. Right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user jeanlyn commented on a diff in the pull request: https://github.com/apache/spark/pull/6833#discussion_r32491951 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -197,7 +197,6 @@ case class InsertIntoHiveTable( table.hiveQlTable.getPartCols().foreach { entry => orderedPartitionSpec.put(entry.getName, partitionSpec.get(entry.getName).getOrElse("")) } - val partVals = MetaStoreUtils.getPvals(table.hiveQlTable.getPartCols, partitionSpec) --- End diff -- This code seems never use,so remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6833#issuecomment-112259097 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8379][SQL]avoid speculative tasks write...
GitHub user jeanlyn opened a pull request: https://github.com/apache/spark/pull/6833 [SPARK-8379][SQL]avoid speculative tasks write to the same file The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379) Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception ``` org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-1/ds=2015-06-15/type=2/part-00301.lzo owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53 but is accessed by DFSClient_attempt_201506031520_0011_m_42_0_-1275047721_57 ``` This pr try to write the data to temporary dir when using dynamic parition avoid the speculative tasks writing the same file You can merge this pull request into a Git repository by running: $ git pull https://github.com/jeanlyn/spark speculation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6833.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6833 commit e19a3bd77b6b9f44479e51659e244e9809b2963d Author: jeanlyn Date: 2015-06-15T16:38:16Z avoid speculative tasks write same file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org