[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/3543 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-137335223 OK, makes sense. Can you close this PR for now then? If there's interest we can always reopen it against the latest master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-137253615 @andrewor14 master has diverged sufficiently from this PR that I don't think it's useful to keep it merge-able. If we think someone's willing to accept the changes to core and sql those subtasks should be revisited with this general approach as a basis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-136891685 @koeninger would you mind updating this patch per @tdas' suggestion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126342142 Added subtasks, changed the title of https://github.com/apache/spark/pull/7772 to refer to the streaming subtask jira ID. Let me know if you see anything on that that needs tweaking before the 1.5 freeze date --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126531913 Okay #7772 has been merged. Mind removing the streaming changes from this PR to make this cleaner? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126209033 Fair point. How about make subtasks of the JIRA for different components, and then use those JIRA ids? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126131982 Yeah, may be that is a good idea for now. In fact, SparkHadoopUtil.get.conf calls newConfiguration only once, and then the configuration is used everywhere. So the newConfiguration() will be called only every once in the lifetime of the application, and the likelihood of the race condition causing a problem here is really small. So I think its fine for now to just address this. The way I would do this is to make the JIRA specific to streaming only (set component and title accordingly). And file a separate JIRA (if not already present) for a possible problem in newConfiguration() linking it to the Hadoop JIRA. Does that make sense? @JoshRosen Any thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126161394 Changing this jira to be streaming only and making another for thread safety issues still leaves all the inconsistent calls to new Configuration in SQL, and probably other places (at a quick grep, external/flume, external/twitter, and maybe core). Ill get a PR with changes only to streaming/, let me know what you guys want to do as far as jiras --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126166935 streaming only pr is at https://github.com/apache/spark/pull/7772 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-125977592 If we're talking about this issue https://issues.apache.org/jira/browse/HADOOP-11209 unless there's something arcade about hadoop's jira, it looks like that was only resolved in April for 2.7 @tdas if you think we're better off / not worse off with at least having the streaming-only changes in for spark 1.5, I can put in a narrower PR for that and we can punt on the thread safety issues for now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-125337754 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-125337773 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-125338962 [Test build #38584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38584/consoleFull) for PR 3543 at commit [`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-125340072 [Test build #38584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38584/consoleFull) for PR 3543 at commit [`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece). * This patch **fails to build**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-125340090 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122707496 Haven't dug in to this in detail yet, but it's possible that the bug that motivated the `CONFIGURATION_INSTANTIATION_LOCK` is no longer relevant to us because we no longer support the affected Hadoop versions. It would be great if someone more familiar with Hadoop version numbering / JIRA conventions could look at the Hadoop JIRA ticket to figure this out. If it turns out that it only affects pre-Hadoop 1.2.1 versions, then we might be able to just remove that lock entirely. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122468822 Aah right, makes sense. That definitely complicates things, because that is the hard questions, whether to put that lock or not. @JoshRosen is the best person to answer that. Unfortunately he is swamped :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122010820 Just to be clear, are we talking about removing just the one-line changes to SQLContext and JavaSQLContext? Everything else in the PR I think is necessary in order to make the changes in streaming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122073728 Otherway round. Just keep the changes in StreamingContext, DStream, and PairDStreamFunctions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122076899 Except that those streaming changes call into SparkHadoopUtil, which was changed in that PR for thread safety reasons. HadoopRDD was changed so there was only 1 lock being used. At that point the only thing left is doc changes and the sql changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-121794031 @koeninger Ping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-121398012 Hey @koeninger, looking at this patch again, I would like to absorb the streaming changes at the very least. Those issues still exist in streaming, and would be a good fix to have. So mind closing this PR and issuing a new PR with only the fixes to the streaming API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-113335277 As far as I know, its still an issue - by default, any checkpoint that relies on hdfs config (e.g. s3 password) won't recover On Jun 18, 2015 6:55 PM, andrewor14 notificati...@github.com wrote: Another ping. @koeninger https://github.com/koeninger @tdas https://github.com/tdas @JoshRosen https://github.com/JoshRosen should we move forward with this patch, or close it since it's mostly gone stale at this point? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3543#issuecomment-113321519. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-113321519 Another ping. @koeninger @tdas @JoshRosen should we move forward with this patch, or close it since it's mostly gone stale at this point? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-96770045 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-82923505 I think its mostly a question of whether committers are comfortable with a PR that changes all of the uses of new Configuration. At this point it'd probably need another audit of the code to see if there are more uses, but that's mostly mechanical. On Tue, Mar 17, 2015 at 10:41 PM, Michael Armbrust notificati...@github.com wrote: ping. Whats the status here? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3543#issuecomment-82725434. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-82725434 ping. Whats the status here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-68758185 Just for posterity: I think a couple of changes sneaked in between my original change being sent as a PR and it being commited, making my code miss some `Configuration` instantiations. At that time, I explicitly avoided changing default arguments to a few methods (since my thinking was that since it's an argument, the user should know what he's doing). But I don't really have an opinion about what's the right approach there, and changing it is fine with me. I also don't have enough background to comment on the thread-safety issues (others have looked at it in much more depth than I have)... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r22438855 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala --- @@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])( keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], - conf: Configuration = new Configuration) { --- End diff -- The scope of this PR is pretty wide in terms of the number of classes it touches, causing issues as different places needs to be handled differently. If you considered moving this sort of changes (`new Configuration` to `sparkContext.hadoopConfiguration`) into a different PR that might be easier to get in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r22446683 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala --- @@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])( keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], - conf: Configuration = new Configuration) { --- End diff -- Based on what Marcelo Vanzin said on the dev list when I brought this issue up, the only reason the problem was still around for me to run into is because he changed some of the uses of new Configuration but not all of them. I agree it's used in a lot of different places, but I'm not sure how piecemeal fixes to only some of the places is helpful to users. Were there still specific concerns about particular classes? On Sun, Jan 4, 2015 at 6:28 AM, Tathagata Das notificati...@github.com wrote: In streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala https://github.com/apache/spark/pull/3543#discussion-diff-22438855: @@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])( keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], - conf: Configuration = new Configuration) { The scope of this PR is pretty wide in terms of the number of classes it touches, causing issues as different places needs to be handled differently. If you considered moving this sort of changes (new Configuration to sparkContext.hadoopConfiguration) into a different PR that might be easier to get in. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3543/files#r22438855. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-68076300 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-68076317 @JoshRosen I leave it to you to figure out changes related to the `SparkHadoopUtil`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-68079378 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24789/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-68079375 [Test build #24789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24789/consoleFull) for PR 3543 at commit [`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67337516 Jenkins is failing org.apache.spark.scheduler.SparkListenerSuite.local metrics org.apache.spark.streaming.flume.FlumeStreamSuite.flume input compressed stream I can't reproduce those test failures locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67379497 I'll look at those tests' code in a little bit to see if I can figure out whether they're prone to random flakiness. I don't recall seeing flakiness from these tests before, so this seems like it's worth investigating. FYI, I have an open PR that tries to address some of the causes of streaming test flakiness: #3687 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67379507 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67379543 (Might as well have Jenkins run this again just to see whether the failure is nondeterministic) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67379826 [Test build #24551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24551/consoleFull) for PR 3543 at commit [`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67395790 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24551/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67395784 [Test build #24551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24551/consoleFull) for PR 3543 at commit [`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67249414 [Test build #24512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24512/consoleFull) for PR 3543 at commit [`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67258780 [Test build #24512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24512/consoleFull) for PR 3543 at commit [`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67258785 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24512/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21610025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext) def createParquetFile[A : Product : TypeTag]( path: String, allowExisting: Boolean = true, - conf: Configuration = new Configuration()): SchemaRDD = { --- End diff -- I seem to recall there being potential thread safety issues related to hadoop configuration objects, resulting in the need to create / clone them. Quick search turned up e.g. https://issues.apache.org/jira/browse/SPARK-2546 I'm not sure how relevant that is to all of these existing situations where new Configuration() is being called. On Tue, Dec 9, 2014 at 5:07 PM, Tathagata Das notificati...@github.com wrote: In sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala https://github.com/apache/spark/pull/3543#discussion-diff-21571141: @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext) def createParquetFile[A : Product : TypeTag]( path: String, allowExisting: Boolean = true, - conf: Configuration = new Configuration()): SchemaRDD = { I think this should be using the hadoopConfiguration object in the SparkContext. That has all the hadoop related configuration already setup and should be what is automatically used. @marmbrus https://github.com/marmbrus should have a better idea. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3543/files#r21571141. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21622115 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext) def createParquetFile[A : Product : TypeTag]( path: String, allowExisting: Boolean = true, - conf: Configuration = new Configuration()): SchemaRDD = { --- End diff -- @koeninger The issue that you linked is concerned with thread-safety issues when multiple threads concurrently modify the same `Configuration` instance. It turns out that there's another, older thread-safety issue related to `Configuration`'s constructor not being thread-safe due to non-thread-safe static state: https://issues.apache.org/jira/browse/HADOOP-10456. This has been fixed in some newer Hadoop releases, but since it was only reported in April I don't think we can ignore it. As a result, https://issues.apache.org/jira/browse/SPARK-1097 implements a workaround which synchronizes on an object before calling `new Configuration`. Currently, I think the extra synchronization logic is only implemented in `HadoopRDD`, but it should probably be used everywhere just to be safe. I think that `HadoopRDD` was the highest-risk place where we might have many threads creating Configurations at the same time, which is probably why that patch's author didn't add the synchronization everywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21638810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext) def createParquetFile[A : Product : TypeTag]( path: String, allowExisting: Boolean = true, - conf: Configuration = new Configuration()): SchemaRDD = { --- End diff -- So let me see if I have things straight - Currently, the code is using new Configuration() as a default, which may have some thread safety issues due to the constructor - my original patch uses SparkHadoopUtil.get.conf, which is a singleton, so should decrease the constructor thread safety problem, but increase the problems if the hadoop configuration is modified. It also won't do the right thing for people who have altered the sparkConf, which makes it no good (I haven't run into this in personal usage of the patched version, because I always pass in a complete sparkConf via properties rather than setting it in code) - @tdas suggested to use this.sparkContext.hadoopConfiguration. This will use the right spark config, but may have thread safety issues both at construction the time the spark context is created, and if the configuration is modified. So Use tdas' suggestion, add a HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized block to SparkHadoopUtil.newConfiguration? And people are out of luck if they have code that used to work because they were modifying new blank instances of Configuration, rather than the now-shared one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21658128 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext) def createParquetFile[A : Product : TypeTag]( path: String, allowExisting: Boolean = true, - conf: Configuration = new Configuration()): SchemaRDD = { --- End diff -- If we're going to use `CONFIGURATION_INSTANTIATION_LOCK` in multiple places, then I think it makes sense to move `CONFIGURATION_INSTANTIATION_LOCK` into `SparkHadoopUtil`, since that seems like a more logical place for it to live than `HadoopRDD`. I like the idea of hiding the synchronization logic behind a method like `SparkHadoopUtil.newConfiguration`. Regarding whether `SparkContext.hadoopConfiguration` will lead to thread-safety issues: I did a bit of research on this while developing a workaround for the other configuration thread-safety issues and wrote [a series of comments](https://issues.apache.org/jira/browse/SPARK-2546?focusedCommentId=14160790page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14160790) citing cases of code in the wild that depend on mutating `SparkContext.hadoopConfiguration`. For example, there are a lot of snippets of code that look like this: ```scala sc.hadoopConfiguration.set(es.resource, syslog/entry) output.saveAsHadoopFile[ESOutputFormat](-) ``` In Spark 1.x, I don't think we'll be able to safely transition away from using the shared `SparkContext.hadoopConfiguration` instance since there's so much existing code that relies on the current behavior. However, I think that there's much less risk of running into thread-safety issues as a result of this. It seems fairly unlikely that you'll have multiple threads mutating the shared configuration in the driver JVM. In executor JVMs, most Hadoop `InputFormats` (and other classes) don't mutate configurations, so we shouldn't run into issues; for those that do mutate, users can always enable the `cloneConf` setting. In a nutshell, I don't think that the shared `sc.hadoopConfiguration` is a good design that we would choose if we were redesigning it, but using it here seems consistent with the behavior that we have elsewhere in Spark as long as we're stuck with this for 1.x. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21658152 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext) def createParquetFile[A : Product : TypeTag]( path: String, allowExisting: Boolean = true, - conf: Configuration = new Configuration()): SchemaRDD = { --- End diff -- And people are out of luck if they have code that used to work because they were modifying new blank instances of Configuration, rather than the now-shared one? I don't think that users were able to access the old `new Configuration()` instance; I think that the only code that could have modified this would be the Parquet code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21571141 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext) def createParquetFile[A : Product : TypeTag]( path: String, allowExisting: Boolean = true, - conf: Configuration = new Configuration()): SchemaRDD = { --- End diff -- I think this should be using the hadoopConfiguration object in the SparkContext. That has all the hadoop related configuration already setup and should be what is automatically used. @marmbrus should have a better idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21571170 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/java/JavaSQLContext.scala --- @@ -84,7 +85,7 @@ class JavaSQLContext(val sqlContext: SQLContext) extends UDFRegistration { beanClass: Class[_], path: String, allowExisting: Boolean = true, - conf: Configuration = new Configuration()): JavaSchemaRDD = { --- End diff -- Same comment as I made in SQLContext --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21571364 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala --- @@ -545,7 +546,7 @@ object StreamingContext extends Logging { def getOrCreate( checkpointPath: String, creatingFunc: () = StreamingContext, - hadoopConf: Configuration = new Configuration(), --- End diff -- I approve this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21571338 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala --- @@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])( keyClass: Class[_], valueClass: Class[_], outputFormatClass: Class[_ : NewOutputFormat[_, _]], - conf: Configuration = new Configuration) { --- End diff -- This should also be the configuration from the `sparkContext.hadoopConfiguration` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21192361 --- Diff: docs/configuration.md --- @@ -664,6 +665,24 @@ Apart from these, the following properties are also available, and may be useful /td /tr tr +tdcodespark.executor.heartbeatInterval/code/td --- End diff -- Pretty sure that's just diff getting confused based on where the hadoop doc changes were inserted, same lines are marked as removed lower in the diff --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3102#issuecomment-65164900 Sorry for the delay here. A few comments: can you open the PR against master instead of a specific branch and also merge with master? The new hadoop config documentation: this was already there and you are just documenting it? /cc @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3102#issuecomment-65176731 Yes, the new hadoop config documentation is just documenting the behavior of SparkHadoopUtil.scala lines 95-100 Sorry about the branch situation, I was unclear on what the plan for 1.2 merges was. Opened a new PR that should merge cleanly into master https://github.com/apache/spark/pull/3543 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/3102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/3102 Spark 4229 Create hadoop configuration in a consistent way You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-4229 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3102.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3102 commit 3cd384f77ba9505fe7c94c82980e07044f6b128c Author: cody koeninger c...@koeninger.org Date: 2014-11-04T22:40:17Z SPARK-4229 use SparkHadoopUtil.get.conf so that hadoop properties are copied from spark config commit f2ee4f9f1ed717d54fb7916ff2cf3ae85468eab0 Author: cody koeninger c...@koeninger.org Date: 2014-11-04T22:41:07Z SPARK-4229 document handling of spark.hadoop.* properties commit eebbdcc53caa214079612732d3a4a13e57cecffe Author: cody koeninger c...@koeninger.org Date: 2014-11-05T03:26:26Z SPARK-4229 fix broken table in documentation, make hadoop doc formatting match that of runtime env --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3102#issuecomment-61755719 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3102#issuecomment-61770464 Looks pretty reasonable to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org