Re: Hadoop jars

2012-11-19 Thread Mohit Anchlia
Generally you need to install it separately



On Nov 19, 2012, at 2:07 PM, Rahul Ravindran rahu...@yahoo.com wrote:

 Hi,
 
  I was exploring how I could look to deploy Flume into our cluster. The 
 binary package available at http://flume.apache.org/download.html does not 
 appear to have the hadoop jar files which are needed by the HDFS sink. I 
 would expected it to be packaged in the lib folder. Is this deliberate? Is 
 there a different binary which I could look to download for the hadoop jars?
 Thanks,
 ~Rahul.


Re: Hadoop jars

2012-11-19 Thread Roshan Naik
Currently, unfortunately, i dont think there is any such documentation.
A  very general answer would be..Normally this list would depend on the
source/sink/channel you are using.
I think it would be nice if the user manual did list these external
dependencies for each component.
I am not the expert on HDFS sink.. but i dont see why it would depend on
anything more than HDFS itself.
-roshan

On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com wrote:

 Are there other such libraries which will need to be downloaded? Is there
 a well-defined location for the hadoop jar and any other jars that flume
 may depend on?




Re: Hadoop jars

2012-11-19 Thread Mohit Anchlia
Easiest way is to install cdh binary and point your flume's classpath to it.

On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com wrote:

 Currently, unfortunately, i dont think there is any such documentation.
 A  very general answer would be..Normally this list would depend on the
 source/sink/channel you are using.
 I think it would be nice if the user manual did list these external
 dependencies for each component.
 I am not the expert on HDFS sink.. but i dont see why it would depend on
 anything more than HDFS itself.
 -roshan


 On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.comwrote:

  Are there other such libraries which will need to be downloaded? Is
 there a well-defined location for the hadoop jar and any other jars that
 flume may depend on?





Re: Hadoop jars

2012-11-19 Thread Hari Shreedharan
Flume installs all required binaries, except for Hadoop (and the dependencies 
it would pull in) and HBase. This is because Flume, like most other Hadoop 
ecosystem components is meant to work against binary incompatible versions of 
Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we 
expect Hadoop to be available on the machines you are running Flume on. Once 
you install Hadoop you should not have any dependency issues. Same is true for 
HBase. 


Hari 

-- 
Hari Shreedharan


On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:

 Easiest way is to install cdh binary and point your flume's classpath to it.
 
 On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com 
 (mailto:ros...@hortonworks.com) wrote:
  Currently, unfortunately, i dont think there is any such documentation. 
  A  very general answer would be..Normally this list would depend on the 
  source/sink/channel you are using.
  I think it would be nice if the user manual did list these external 
  dependencies for each component.
  I am not the expert on HDFS sink.. but i dont see why it would depend on 
  anything more than HDFS itself. 
  -roshan 
  
  
  On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com 
  (mailto:rahu...@yahoo.com) wrote:
   Are there other such libraries which will need to be downloaded? Is there 
   a well-defined location for the hadoop jar and any other jars that flume 
   may depend on? 
   
  
 



Re: Hadoop jars

2012-11-19 Thread Rahul Ravindran
Thanks for the responses.

Good to know that the only external dependencies are Hadoop and Hbase. We will 
deploy those components only on boxes which are going to have those sinks set 
up.



 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org 
Sent: Monday, November 19, 2012 3:29 PM
Subject: Re: Hadoop jars
 

Flume installs all required binaries, except for Hadoop (and the dependencies 
it would pull in) and HBase. This is because Flume, like most other Hadoop 
ecosystem components is meant to work against binary incompatible versions of 
Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we 
expect Hadoop to be available on the machines you are running Flume on. Once 
you install Hadoop you should not have any dependency issues. Same is true for 
HBase. 


Hari


-- 
Hari Shreedharan

On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
Easiest way is to install cdh binary and point your flume's classpath to it.


On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com wrote:

Currently, unfortunately, i dont think there is any such documentation. 
A  very general answer would be..Normally this list would depend on the 
source/sink/channel you are using.
I think it would be nice if the user manual did list these external 
dependencies for each component.
I am not the expert on HDFS sink.. but i dont see why it would depend on 
anything more than HDFS itself. 
-roshan 



On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com wrote:

Are there other such libraries which will need to be downloaded? Is there a 
well-defined location for the hadoop jar and any other jars that flume may 
depend on?



 

Re: Hadoop jars

2012-11-19 Thread Hari Shreedharan
Unfortunately, the FileChannel too has a hadoop dependency - even though the 
classes are never used. So you need the hadoop jars (and they should be added 
to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should be set) 
on machines which will use the FileChannel. The channel directly does not 
depend on Hadoop anymore, but still needs them in the class path because we 
support migration from the older format to new format. 


Thanks,
Hari

-- 
Hari Shreedharan


On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:

 Thanks for the responses.
 
 Good to know that the only external dependencies are Hadoop and Hbase. We 
 will deploy those components only on boxes which are going to have those 
 sinks set up.
 
 From: Hari Shreedharan hshreedha...@cloudera.com 
 (mailto:hshreedha...@cloudera.com)
 To: user@flume.apache.org (mailto:user@flume.apache.org) 
 Sent: Monday, November 19, 2012 3:29 PM
 Subject: Re: Hadoop jars
 
 Flume installs all required binaries, except for Hadoop (and the dependencies 
 it would pull in) and HBase. This is because Flume, like most other Hadoop 
 ecosystem components is meant to work against binary incompatible versions of 
 Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we 
 expect Hadoop to be available on the machines you are running Flume on. Once 
 you install Hadoop you should not have any dependency issues. Same is true 
 for HBase. 
 
 
 Hari 
 
 -- 
 Hari Shreedharan
 
 On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
  Easiest way is to install cdh binary and point your flume's classpath to it.
  
  On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com 
  (mailto:ros...@hortonworks.com) wrote:
   Currently, unfortunately, i dont think there is any such documentation. 
   A  very general answer would be..Normally this list would depend on the 
   source/sink/channel you are using.
   I think it would be nice if the user manual did list these external 
   dependencies for each component.
   I am not the expert on HDFS sink.. but i dont see why it would depend on 
   anything more than HDFS itself. 
   -roshan 
   
   
   On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com 
   (mailto:rahu...@yahoo.com) wrote:
Are there other such libraries which will need to be downloaded? Is 
there a well-defined location for the hadoop jar and any other jars 
that flume may depend on? 

   
  
 
 
 



Re: Hadoop jars

2012-11-19 Thread Rahul Ravindran
That is unfortunate. Is it sufficient if I package just hadoop-common.jar or is 
the recommended way essentially doing an apt-get install flume-ng which will 
install the below

# apt-cache depends flume-ng

flume-ng
  Depends: adduser
  Depends: hadoop-hdfs
  Depends: bigtop-utils

My concern is that hadoop-hdfs brings in a ton of other stuff which will not be 
used in any box except the one running the hdfs sink.

Thanks,
~Rahul.


 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Monday, November 19, 2012 4:08 PM
Subject: Re: Hadoop jars
 

Unfortunately, the FileChannel too has a hadoop dependency - even though the 
classes are never used. So you need the hadoop jars (and they should be added 
to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should be set) 
on machines which will use the FileChannel. The channel directly does not 
depend on Hadoop anymore, but still needs them in the class path because we 
support migration from the older format to new format. 


Thanks,
Hari


-- 
Hari Shreedharan

On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:
Thanks for the responses.


Good to know that the only external dependencies are Hadoop and Hbase. We will 
deploy those components only on boxes which are going to have those sinks set 
up.




 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org 
Sent: Monday, November 19, 2012 3:29 PM
Subject: Re: Hadoop jars
 

Flume installs all required binaries, except for Hadoop (and the dependencies 
it would pull in) and HBase. This is because Flume, like most other Hadoop 
ecosystem components is meant to work against binary incompatible versions of 
Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we 
expect Hadoop to be available on the machines you are running Flume on. Once 
you install Hadoop you should not have any dependency issues. Same is true for 
HBase. 




Hari


-- 
Hari Shreedharan


On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
Easiest way is to install cdh binary and point your flume's classpath to it.


On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com wrote:

Currently, unfortunately, i dont think there is any such documentation. 
A  very general answer would be..Normally this list would depend on the 
source/sink/channel you are using.
I think it would be nice if the user manual did list these external 
dependencies for each component.
I am not the expert on HDFS sink.. but i dont see why it would depend on 
anything more than HDFS itself. 
-roshan 



On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com wrote:

Are there other such libraries which will need to be downloaded? Is there a 
well-defined location for the hadoop jar and any other jars that flume may 
depend on?



 



 

Re: Hadoop jars

2012-11-19 Thread Rahul Ravindran
Thanks. We will use that. 

Sent from my phone.Excuse the terseness.

On Nov 19, 2012, at 4:53 PM, Hari Shreedharan hshreedha...@cloudera.com wrote:

 No, you don't need Hdfs. Hadoop common/ Hadoop core should be enough. But 
 make sure you add it to the classpath as I mentioned before.
 
 Hari
 
 On Nov 19, 2012, at 4:27 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 
 That is unfortunate. Is it sufficient if I package just hadoop-common.jar or 
 is the recommended way essentially doing an apt-get install flume-ng which 
 will install the below
 
 # apt-cache depends flume-ng
 
 flume-ng
   Depends: adduser
   Depends: hadoop-hdfs
   Depends: bigtop-utils
 
 My concern is that hadoop-hdfs brings in a ton of other stuff which will not 
 be used in any box except the one running the hdfs sink.
 
 Thanks,
 ~Rahul.
 From: Hari Shreedharan hshreedha...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
 Sent: Monday, November 19, 2012 4:08 PM
 Subject: Re: Hadoop jars
 
 Unfortunately, the FileChannel too has a hadoop dependency - even though the 
 classes are never used. So you need the hadoop jars (and they should be 
 added to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should 
 be set) on machines which will use the FileChannel. The channel directly 
 does not depend on Hadoop anymore, but still needs them in the class path 
 because we support migration from the older format to new format.
 
 
 Thanks,
 Hari
 
 -- 
 Hari Shreedharan
 
 On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:
 Thanks for the responses.
 
 Good to know that the only external dependencies are Hadoop and Hbase. We 
 will deploy those components only on boxes which are going to have those 
 sinks set up.
 
 From: Hari Shreedharan hshreedha...@cloudera.com
 To: user@flume.apache.org 
 Sent: Monday, November 19, 2012 3:29 PM
 Subject: Re: Hadoop jars
 
 Flume installs all required binaries, except for Hadoop (and the 
 dependencies it would pull in) and HBase. This is because Flume, like most 
 other Hadoop ecosystem components is meant to work against binary 
 incompatible versions of Hadoop (Hadoop-1/Hadoop2). So instead of packaging 
 hadoop jars with Flume, we expect Hadoop to be available on the machines 
 you are running Flume on. Once you install Hadoop you should not have any 
 dependency issues. Same is true for HBase.
 
 
 Hari
 
 -- 
 Hari Shreedharan
 
 On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
 Easiest way is to install cdh binary and point your flume's classpath to 
 it.
 
 On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com 
 wrote:
 Currently, unfortunately, i dont think there is any such documentation.
 A  very general answer would be..Normally this list would depend on the 
 source/sink/channel you are using.
 I think it would be nice if the user manual did list these external 
 dependencies for each component.
 I am not the expert on HDFS sink.. but i dont see why it would depend on 
 anything more than HDFS itself. 
 -roshan
 
 
 On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com 
 wrote:
 Are there other such libraries which will need to be downloaded? Is 
 there a well-defined location for the hadoop jar and any other jars that 
 flume may depend on?