Re: Clarifying that spark-x.x.x-bin-hadoopx.x.tgz doesn't include Hadoop itself

Nicholas Chammas Fri, 29 Jul 2016 13:13:38 -0700

Hmm, perhaps I'm the one who's confused. 🤔

I thought the person in the linked discussion expected Hadoop itself (i.e.
the full application, not just the jars) to somehow be included, but
rereading the discussion I may have just misinterpreted them.


The Hadoop jars packaged with Spark just allow Spark to interact with
Hadoop, or allow it to use the Hadoop API for interacting with systems like
S3, right? If you want HDFS, MapReduce, etc. you're obviously not getting
that in those Spark packages.

Maybe this was already clear to those users and I just injected some
confusion into the discussion.

Nick

On Fri, Jul 29, 2016 at 4:06 PM Marcelo Vanzin <van...@cloudera.com> wrote:

> Why do you say Hadoop is not included?
>
> The Hadoop jars are there in the tarball, and match the advertised
> version. There is (or at least there was in 1.x) a version called
> "without-hadoop" which did not include any Hadoop jars.
>
> On Fri, Jul 29, 2016 at 12:56 PM, Nicholas Chammas
> <nicholas.cham...@gmail.com> wrote:
> > I had an interaction on my project today that suggested some people may
> be
> > confused about what the packages available on the downloads page are
> > actually for.
> >
> > Specifically, the various -hadoopx.x.tgz packages suggest that Hadoop
> itself
> > is actually included in the package. I’m not 100% sure myself honestly,
> but
> > as I explained in my comment linked above, I believe the -hadoopx.x.tgz
> just
> > indicates the version of Hadoop that Spark was built against.
> >
> > Does it make sense to add a brief note to the downloads page explaining
> > this?
> >
> > I am assuming it would be too disruptive to change the package names to
> > something more descriptive like -built-against-hadoopx.x.tgz.
> >
> > Nick
>
>
>
> --
> Marcelo
>

Re: Clarifying that spark-x.x.x-bin-hadoopx.x.tgz doesn't include Hadoop itself

Reply via email to