[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-09-03 Thread koeninger
Github user koeninger closed the pull request at:

https://github.com/apache/spark/pull/3543


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-09-02 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-137335223
  
OK, makes sense. Can you close this PR for now then? If there's interest we 
can always reopen it against the latest master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-09-02 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-137253615
  
@andrewor14 master has diverged sufficiently from this PR that I don't 
think it's useful to keep it merge-able.  If we think someone's willing to 
accept the changes to core and sql those subtasks should be revisited with this 
general approach as a basis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-09-01 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-136891685
  
@koeninger would you mind updating this patch per @tdas' suggestion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-30 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-126342142
  
Added subtasks, changed the title of 
https://github.com/apache/spark/pull/7772 to refer to the streaming subtask 
jira ID.  Let me know if you see anything on that that needs tweaking before 
the 1.5 freeze date


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-30 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-126531913
  
Okay #7772 has been merged. Mind removing the streaming changes from this 
PR to make this cleaner?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-30 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-126209033
  
Fair point. How about make subtasks of the JIRA for different components, 
and then use those JIRA ids?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-126131982
  
Yeah, may be that is a good idea for now. In fact, SparkHadoopUtil.get.conf 
calls newConfiguration only once, and then the configuration is used 
everywhere. So the newConfiguration() will be called only every once in the 
lifetime of the application, and the likelihood of the race condition causing a 
problem here is really small. So I think its fine for now to just address this. 

The way I would do this is to make the JIRA specific to streaming only (set 
component and title accordingly). And file a separate JIRA (if not already 
present) for a possible problem in newConfiguration() linking it to the Hadoop 
JIRA. Does that make sense? @JoshRosen Any thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-126161394
  
Changing this jira to be streaming only and making another for thread 
safety issues still leaves all the inconsistent calls to new Configuration in 
SQL, and probably other places (at a quick grep, external/flume, 
external/twitter, and maybe core).

Ill get a PR with changes only to streaming/, let me know what you guys 
want to do as far as jiras


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-126166935
  
streaming only pr is at https://github.com/apache/spark/pull/7772


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-125977592
  
If we're talking about this issue 
https://issues.apache.org/jira/browse/HADOOP-11209 unless there's something 
arcade about hadoop's jira, it looks like that was only resolved in April for 
2.7

@tdas if you think we're better off / not worse off with at least having 
the streaming-only changes in for spark 1.5, I can put in a narrower PR for 
that and we can punt on the thread safety issues for now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-125337754
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-125337773
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-125338962
  
  [Test build #38584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38584/consoleFull)
 for   PR 3543 at commit 
[`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-125340072
  
  [Test build #38584 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38584/consoleFull)
 for   PR 3543 at commit 
[`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece).
 * This patch **fails to build**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-125340090
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-122707496
  
Haven't dug in to this in detail yet, but it's possible that the bug that 
motivated the `CONFIGURATION_INSTANTIATION_LOCK` is no longer relevant to us 
because we no longer support the affected Hadoop versions.  It would be great 
if someone more familiar with Hadoop version numbering / JIRA conventions could 
look at the Hadoop JIRA ticket to figure this out.  If it turns out that it 
only affects pre-Hadoop 1.2.1 versions, then we might be able to just remove 
that lock entirely. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-17 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-122468822
  
Aah right, makes sense. That definitely complicates things, because that is 
the hard questions, whether to put that lock or not. @JoshRosen is the best 
person to answer that. Unfortunately he is swamped :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-16 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-122010820
  
Just to be clear, are we talking about removing just the one-line changes 
to SQLContext and JavaSQLContext?

Everything else in the PR I think is necessary in order to make the changes 
in streaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-16 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-122073728
  
Otherway round. Just keep the changes in StreamingContext, DStream, and 
PairDStreamFunctions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-16 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-122076899
  
Except that those streaming changes call into SparkHadoopUtil, which was 
changed in that PR for thread safety reasons.  HadoopRDD was changed so there 
was only 1 lock being used.  At that point the only thing left is doc changes 
and the sql changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-15 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-121794031
  
@koeninger Ping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-14 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-121398012
  
Hey @koeninger, looking at this patch again, I would like to absorb the 
streaming changes at the very least. Those issues still exist in streaming, and 
would be a good fix to have. So mind closing this PR and issuing a new PR with 
only the fixes to the streaming API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-06-18 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-113335277
  
As far as I know, its still an issue - by default, any checkpoint that
relies on hdfs config (e.g. s3 password) won't recover
On Jun 18, 2015 6:55 PM, andrewor14 notificati...@github.com wrote:

 Another ping. @koeninger https://github.com/koeninger @tdas
 https://github.com/tdas @JoshRosen https://github.com/JoshRosen
 should we move forward with this patch, or close it since it's mostly gone
 stale at this point?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3543#issuecomment-113321519.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-06-18 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-113321519
  
Another ping. @koeninger @tdas @JoshRosen should we move forward with this 
patch, or close it since it's mostly gone stale at this point?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-96770045
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-03-18 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-82923505
  
I think its mostly a question of whether committers are comfortable with a
PR that changes all of the uses of new Configuration.

At this point it'd probably need another audit of the code to see if there
are more uses, but that's mostly mechanical.

On Tue, Mar 17, 2015 at 10:41 PM, Michael Armbrust notificati...@github.com
 wrote:

 ping. Whats the status here?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3543#issuecomment-82725434.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-03-17 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-82725434
  
ping.  Whats the status here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-01-05 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-68758185
  
Just for posterity: I think a couple of changes sneaked in between my 
original change being sent as a PR and it being commited, making my code miss 
some `Configuration` instantiations.

At that time, I explicitly avoided changing default arguments to a few 
methods (since my thinking was that since it's an argument, the user should 
know what he's doing). But I don't really have an opinion about what's the 
right approach there, and changing it is fine with me.

I also don't have enough background to comment on the thread-safety issues 
(others have looked at it in much more depth than I have)...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-01-04 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r22438855
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala
 ---
@@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, 
V)])(
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
-  conf: Configuration = new Configuration) {
--- End diff --

The scope of this PR is pretty wide in terms of the number of classes it 
touches, causing issues as different places needs to be handled differently. If 
you considered moving this sort of changes (`new Configuration` to 
`sparkContext.hadoopConfiguration`) into a different PR that might be easier to 
get in. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-01-04 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r22446683
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala
 ---
@@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, 
V)])(
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
-  conf: Configuration = new Configuration) {
--- End diff --

Based on what Marcelo Vanzin said on the dev list when I brought this issue
up, the only reason the problem was still around for me to run into is
because he changed some of the uses of new Configuration but not all of
them.

I agree it's used in a lot of different places, but I'm not sure how
piecemeal fixes to only some of the places is helpful to users. Were there
still specific concerns about particular classes?

On Sun, Jan 4, 2015 at 6:28 AM, Tathagata Das notificati...@github.com
wrote:

 In
 
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala
 https://github.com/apache/spark/pull/3543#discussion-diff-22438855:

  @@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val dstream: 
DStream[(K, V)])(
 keyClass: Class[_],
 valueClass: Class[_],
 outputFormatClass: Class[_ : NewOutputFormat[_, _]],
  -  conf: Configuration = new Configuration) {

 The scope of this PR is pretty wide in terms of the number of classes it
 touches, causing issues as different places needs to be handled
 differently. If you considered moving this sort of changes (new
 Configuration to sparkContext.hadoopConfiguration) into a different PR
 that might be easier to get in.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3543/files#r22438855.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-24 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-68076300
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-24 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-68076317
  
@JoshRosen I leave it to you to figure out changes related to the 
`SparkHadoopUtil`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-68079378
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24789/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-68079375
  
  [Test build #24789 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24789/consoleFull)
 for   PR 3543 at commit 
[`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67337516
  
Jenkins is failing

org.apache.spark.scheduler.SparkListenerSuite.local metrics
org.apache.spark.streaming.flume.FlumeStreamSuite.flume input compressed 
stream

I can't reproduce those test failures locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67379497
  
I'll look at those tests' code in a little bit to see if I can figure out 
whether they're prone to random flakiness.  I don't recall seeing flakiness 
from these tests before, so this seems like it's worth investigating.  FYI, I 
have an open PR that tries to address some of the causes of streaming test 
flakiness: #3687


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67379507
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67379543
  
(Might as well have Jenkins run this again just to see whether the failure 
is nondeterministic)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67379826
  
  [Test build #24551 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24551/consoleFull)
 for   PR 3543 at commit 
[`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67395790
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24551/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67395784
  
  [Test build #24551 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24551/consoleFull)
 for   PR 3543 at commit 
[`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67249414
  
  [Test build #24512 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24512/consoleFull)
 for   PR 3543 at commit 
[`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67258780
  
  [Test build #24512 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24512/consoleFull)
 for   PR 3543 at commit 
[`bfc550e`](https://github.com/apache/spark/commit/bfc550ef0b7b535adb0aa019f30dd4771c24aece).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3543#issuecomment-67258785
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24512/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21610025
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
   def createParquetFile[A : Product : TypeTag](
   path: String,
   allowExisting: Boolean = true,
-  conf: Configuration = new Configuration()): SchemaRDD = {
--- End diff --

I seem to recall there being potential thread safety issues related to
hadoop configuration objects, resulting in the need to create / clone them.

Quick search turned up e.g.

https://issues.apache.org/jira/browse/SPARK-2546

I'm not sure how relevant that is to all of these existing situations where
new Configuration() is being called.

On Tue, Dec 9, 2014 at 5:07 PM, Tathagata Das notificati...@github.com
wrote:

 In sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
 https://github.com/apache/spark/pull/3543#discussion-diff-21571141:

  @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
 def createParquetFile[A : Product : TypeTag](
 path: String,
 allowExisting: Boolean = true,
  -  conf: Configuration = new Configuration()): SchemaRDD = {

 I think this should be using the hadoopConfiguration object in the
 SparkContext. That has all the hadoop related configuration already setup
 and should be what is automatically used. @marmbrus
 https://github.com/marmbrus should have a better idea.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3543/files#r21571141.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21622115
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
   def createParquetFile[A : Product : TypeTag](
   path: String,
   allowExisting: Boolean = true,
-  conf: Configuration = new Configuration()): SchemaRDD = {
--- End diff --

@koeninger The issue that you linked is concerned with thread-safety issues 
when multiple threads concurrently modify the same `Configuration` instance.

It turns out that there's another, older thread-safety issue related to 
`Configuration`'s constructor not being thread-safe due to non-thread-safe 
static state: https://issues.apache.org/jira/browse/HADOOP-10456.  This has 
been fixed in some newer Hadoop releases, but since it was only reported in 
April I don't think we can ignore it.  As a result, 
https://issues.apache.org/jira/browse/SPARK-1097 implements a workaround which 
synchronizes on an object before calling `new Configuration`.  Currently, I 
think the extra synchronization logic is only implemented in `HadoopRDD`, but 
it should probably be used everywhere just to be safe.  I think that 
`HadoopRDD` was the highest-risk place where we might have many threads 
creating Configurations at the same time, which is probably why that patch's 
author didn't add the synchronization everywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21638810
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
   def createParquetFile[A : Product : TypeTag](
   path: String,
   allowExisting: Boolean = true,
-  conf: Configuration = new Configuration()): SchemaRDD = {
--- End diff --

So let me see if I have things straight

- Currently, the code is using new Configuration() as a default, which may 
have some thread safety issues due to the constructor

- my original patch uses SparkHadoopUtil.get.conf, which is a singleton, so 
should decrease the constructor thread safety problem, but increase the 
problems if the hadoop configuration is modified.  It also won't do the right 
thing for people who have altered the sparkConf, which makes it no good (I 
haven't run into this in personal usage of the patched version, because I 
always pass in a complete sparkConf via properties rather than setting it in 
code)

- @tdas suggested to use this.sparkContext.hadoopConfiguration.  This will 
use the right spark config, but may have thread safety issues both at 
construction the time the spark context is created, and if the configuration is 
modified.

So

Use tdas' suggestion, add a 
HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized block to 
SparkHadoopUtil.newConfiguration?  And people are out of luck if they have code 
that used to work because they were modifying new blank instances of 
Configuration, rather than the now-shared one? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21658128
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
   def createParquetFile[A : Product : TypeTag](
   path: String,
   allowExisting: Boolean = true,
-  conf: Configuration = new Configuration()): SchemaRDD = {
--- End diff --

If we're going to use `CONFIGURATION_INSTANTIATION_LOCK` in multiple 
places, then I think it makes sense to move `CONFIGURATION_INSTANTIATION_LOCK` 
into `SparkHadoopUtil`, since that seems like a more logical place for it to 
live than `HadoopRDD`.  I like the idea of hiding the synchronization logic 
behind a method like `SparkHadoopUtil.newConfiguration`.

Regarding whether `SparkContext.hadoopConfiguration` will lead to 
thread-safety issues: I did a bit of research on this while developing a 
workaround for the other configuration thread-safety issues and wrote [a series 
of 
comments](https://issues.apache.org/jira/browse/SPARK-2546?focusedCommentId=14160790page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14160790)
 citing cases of code in the wild that depend on mutating 
`SparkContext.hadoopConfiguration`.  For example, there are a lot of snippets 
of code that look like this:

```scala
sc.hadoopConfiguration.set(es.resource, syslog/entry)
output.saveAsHadoopFile[ESOutputFormat](-)
```

In Spark 1.x, I don't think we'll be able to safely transition away from 
using the shared `SparkContext.hadoopConfiguration` instance since there's so 
much existing code that relies on the current behavior.

However, I think that there's much less risk of running into thread-safety 
issues as a result of this.  It seems fairly unlikely that you'll have multiple 
threads mutating the shared configuration in the driver JVM.  In executor JVMs, 
most Hadoop `InputFormats` (and other classes) don't mutate configurations, so 
we shouldn't run into issues; for those that do mutate, users can always enable 
the `cloneConf` setting.

In a nutshell, I don't think that the shared `sc.hadoopConfiguration` is a 
good design that we would choose if we were redesigning it, but using it here 
seems consistent with the behavior that we have elsewhere in Spark as long as 
we're stuck with this for 1.x.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21658152
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
   def createParquetFile[A : Product : TypeTag](
   path: String,
   allowExisting: Boolean = true,
-  conf: Configuration = new Configuration()): SchemaRDD = {
--- End diff --

  And people are out of luck if they have code that used to work because 
they were modifying new blank instances of Configuration, rather than the 
now-shared one?

I don't think that users were able to access the old `new Configuration()` 
instance; I think that the only code that could have modified this would be the 
Parquet code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-09 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21571141
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
   def createParquetFile[A : Product : TypeTag](
   path: String,
   allowExisting: Boolean = true,
-  conf: Configuration = new Configuration()): SchemaRDD = {
--- End diff --

I think this should be using the hadoopConfiguration object in the 
SparkContext. That has all the hadoop related configuration already setup and 
should be what is automatically used. @marmbrus should have a better idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-09 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21571170
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/api/java/JavaSQLContext.scala ---
@@ -84,7 +85,7 @@ class JavaSQLContext(val sqlContext: SQLContext) extends 
UDFRegistration {
   beanClass: Class[_],
   path: String,
   allowExisting: Boolean = true,
-  conf: Configuration = new Configuration()): JavaSchemaRDD = {
--- End diff --

Same comment as I made in SQLContext


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-09 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21571364
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -545,7 +546,7 @@ object StreamingContext extends Logging {
   def getOrCreate(
   checkpointPath: String,
   creatingFunc: () = StreamingContext,
-  hadoopConf: Configuration = new Configuration(),
--- End diff --

I approve this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-09 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21571338
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala
 ---
@@ -789,7 +790,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, 
V)])(
   keyClass: Class[_],
   valueClass: Class[_],
   outputFormatClass: Class[_ : NewOutputFormat[_, _]],
-  conf: Configuration = new Configuration) {
--- End diff --

This should also be the configuration from the 
`sparkContext.hadoopConfiguration`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/3543#discussion_r21192361
  
--- Diff: docs/configuration.md ---
@@ -664,6 +665,24 @@ Apart from these, the following properties are also 
available, and may be useful
   /td
 /tr
 tr
+tdcodespark.executor.heartbeatInterval/code/td
--- End diff --

Pretty sure that's just diff getting confused based on where the hadoop doc 
changes were inserted, same lines are marked as removed lower in the diff


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-12-01 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3102#issuecomment-65164900
  
Sorry for the delay here.  A few comments: can you open the PR against 
master instead of a specific branch and also merge with master?

The new hadoop config documentation: this was already there and you are 
just documenting it? /cc @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-12-01 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3102#issuecomment-65176731
  
Yes, the new hadoop config documentation is just documenting the behavior 
of SparkHadoopUtil.scala lines 95-100

Sorry about the branch situation, I was unclear on what the plan for 1.2 
merges was.
Opened a new PR that should merge cleanly into master

https://github.com/apache/spark/pull/3543


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-12-01 Thread koeninger
Github user koeninger closed the pull request at:

https://github.com/apache/spark/pull/3102


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-11-04 Thread koeninger
GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/3102

Spark 4229 Create hadoop configuration in a consistent way



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 SPARK-4229

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3102


commit 3cd384f77ba9505fe7c94c82980e07044f6b128c
Author: cody koeninger c...@koeninger.org
Date:   2014-11-04T22:40:17Z

SPARK-4229 use SparkHadoopUtil.get.conf so that hadoop properties are 
copied from spark config

commit f2ee4f9f1ed717d54fb7916ff2cf3ae85468eab0
Author: cody koeninger c...@koeninger.org
Date:   2014-11-04T22:41:07Z

SPARK-4229 document handling of spark.hadoop.* properties

commit eebbdcc53caa214079612732d3a4a13e57cecffe
Author: cody koeninger c...@koeninger.org
Date:   2014-11-05T03:26:26Z

SPARK-4229 fix broken table in documentation, make hadoop doc formatting 
match that of runtime env




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3102#issuecomment-61755719
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-11-04 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3102#issuecomment-61770464
  
Looks pretty reasonable to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org