[ https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012301#comment-14012301 ]
Sean Owen commented on SPARK-1518: ---------------------------------- Yes Matei that's what I'm getting at. Spark is a client of Hadoop, so if I use Spark, and Spark uses Hadoop, then I have to match the Hadoop that Spark uses to the cluster. It's not just if my app uses HDFS directly. I can manually override hadoop-client, although, I'd have to reproduce a lot of the dependency-graph manipulation in Spark's build to make it work. In Sandy's blog post example he's just running the code on the cluster and pointing at the matched Spark/Hadoop jars already there. That's also a solution that will work for a lot of use cases. I accept that the use case I have in mind, which is adding Spark to a larger stand-alone app, is not everyone's use case, although it's not crazy. It doesn't work out if instead the Spark/Hadoop jars are packaged together into an assembly and run that way. I agree overriding the Hadoop dependency is a solution, and accept that Spark shouldn't necessarily bend over backwards for these Hadoop issues, but this does go back to your point about accessibility. Right now I think anyone that wants to do what I'm doing for any Hadoop 2 app, and doesn't want to make a custom build or manually override dependencies, will just point at Cloudera's "0.9.0-cdh5.0.1" even if not using CDH. That felt funny. Apologies if I have somehow totally missed something. I've talked too much, thanks for hearing out the use case. Maybe best to see if this is actually an issue anyone shares. > Spark master doesn't compile against hadoop-common trunk > -------------------------------------------------------- > > Key: SPARK-1518 > URL: https://issues.apache.org/jira/browse/SPARK-1518 > Project: Spark > Issue Type: Bug > Reporter: Marcelo Vanzin > Assignee: Colin Patrick McCabe > Priority: Critical > > FSDataOutputStream::sync() has disappeared from trunk in Hadoop; > FileLogger.scala is calling it. > I've changed it locally to hsync() so I can compile the code, but haven't > checked yet whether those are equivalent. hsync() seems to have been there > forever, so it hopefully works with all versions Spark cares about. -- This message was sent by Atlassian JIRA (v6.2#6252)