Hi Utku,

Apache Hadoop 0.20 cannot support Sqoop as-is. Sqoop makes use of the
DataDrivenDBInputFormat (among other APIs) which are not shipped with
Apache's 0.20 release. In order to get Sqoop working on 20, you'd need to
apply a lengthy list of patches from the project source repository to your
copy of Hadoop and recompile. Or you could just download it all from
Cloudera, where we've done that work for you :)

So as it stands, Sqoop won't be able to run on 0.20 unless you choose to use
Cloudera's distribution.  Do note that your use of the term "fork" is a bit
strong here; with the exception of (minor) modifications to make it interact
in a more compatible manner with the external Linux environment, our
distribution only includes code that's available to the project at large.
But some of that code has not been rolled into a binary release from Apache
yet. If you choose to go with Cloudera's distribution, it just means that
you get publicly-available features (like Sqoop, MRUnit, etc.) a year or so
ahead of what Apache has formally released, but our codebase isn't radically
diverging; CDH is just somewhere ahead of the Apache 0.20 release, but
behind Apache's svn trunk. (All of Sqoop, MRUnit, etc. are available in the
Hadoop source repository on the trunk branch.)

If you install our distribution, then Sqoop will be installed in
/usr/lib/hadoop-0.20/contrib/sqoop and /usr/bin/sqoop for you. There isn't a
separate package to install Sqoop independent of the rest of CDH; thus no
extra download link on our site.

I hope this helps!

Good luck,
- Aaron


On Wed, Mar 17, 2010 at 4:30 AM, Reik Schatz <reik.sch...@bwin.org> wrote:

> At least for MRUnit, I was not able to find it outside of the Cloudera
> distribution (CDH). What I did: installing CDH locally using apt (Ubuntu),
> searched for and copied the mrunit library into my local Maven repository,
> and removed CDH after. I guess the same is somehow possible for Sqoop.
>
> /Reik
>
>
> Utku Can Topçu wrote:
>
>> Dear All,
>>
>> I'm trying to run tests using MySQL as some kind of a datasource, so I
>> thought cloudera's sqoop would be a nice project to have in the
>> production.
>> However, I'm not using the cloudera's hadoop distribution right now, and
>> actually I'm not thinking of switching from a main project to a fork.
>>
>> I read the documentation on sqoop at
>> http://www.cloudera.com/developers/downloads/sqoop/ but there are
>> actually
>> no links for downloading the sqoop itself.
>>
>> Has anyone here know, and tried to use sqoop with the latest apache
>> hadoop?
>> If so can you give me some tips and tricks on it?
>>
>> Best Regards,
>> Utku
>>
>>
>
> --
>
> *Reik Schatz*
> Technical Lead, Platform
> P: +46 8 562 470 00
> M: +46 76 25 29 872
> F: +46 8 562 470 01
> E: reik.sch...@bwin.org <mailto:reik.sch...@bwin.org>
> */bwin/* Games AB
> Klarabergsviadukten 82,
> 111 64 Stockholm, Sweden
>
> [This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and destroy this e-mail. Any
> unauthorised copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.]
>
>

Reply via email to