[ 
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012301#comment-14012301
 ] 

Sean Owen commented on SPARK-1518:
----------------------------------

Yes Matei that's what I'm getting at. Spark is a client of Hadoop, so if I use 
Spark, and Spark uses Hadoop, then I have to match the Hadoop that Spark uses 
to the cluster. It's not just if my app uses HDFS directly. I can manually 
override hadoop-client, although, I'd have to reproduce a lot of the 
dependency-graph manipulation in Spark's build to make it work.

In Sandy's blog post example he's just running the code on the cluster and 
pointing at the matched Spark/Hadoop jars already there. That's also a solution 
that will work for a lot of use cases. I accept that the use case I have in 
mind, which is adding Spark to a larger stand-alone app, is not everyone's use 
case, although it's not crazy. It doesn't work out if instead the Spark/Hadoop 
jars are packaged together into an assembly and run that way.

I agree overriding the Hadoop dependency is a solution, and accept that Spark 
shouldn't necessarily bend over backwards for these Hadoop issues, but this 
does go back to your point about accessibility. Right now I think anyone that 
wants to do what I'm doing for any Hadoop 2 app, and doesn't want to make a 
custom build or manually override dependencies, will just point at Cloudera's 
"0.9.0-cdh5.0.1" even if not using CDH. That felt funny.

Apologies if I have somehow totally missed something. I've talked too much, 
thanks for hearing out the use case. Maybe best to see if this is actually an 
issue anyone shares.

> Spark master doesn't compile against hadoop-common trunk
> --------------------------------------------------------
>
>                 Key: SPARK-1518
>                 URL: https://issues.apache.org/jira/browse/SPARK-1518
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Marcelo Vanzin
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>
> FSDataOutputStream::sync() has disappeared from trunk in Hadoop; 
> FileLogger.scala is calling it.
> I've changed it locally to hsync() so I can compile the code, but haven't 
> checked yet whether those are equivalent. hsync() seems to have been there 
> forever, so it hopefully works with all versions Spark cares about.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to