How to get JobTracker metrics from Hadoop via JMX

2010-04-14 Thread stan lee
Currently I am trying to use JMX to collect metrics information, I can get
information successfully from namenode(NameNodeActivtyMBean).

Also for jobTracker or TaskTracker(although it's common that they are same
with datanode), is there any Mbeans to be used to get corresponding
metrics(I know in Hadoop it has DataNodeActivityMBean.java related with
datanode)? Like the storage used percentage, left free space?



It would be better if there is any document or API interface which describe
this(all the JMX interfaces which is used to get metrics for all kinds of
nodes).

Thanks!

StanLee


Questions on dfs.block.size

2010-05-03 Thread stan lee
Hi Experts:

Is there any method to make the dfs.block.size to take effect on old file
before it changes? Or is it meaningful?

If I run a job A, would it copy input files to hdfs file system if that
input file has been in hdfs file system?

If so, then perhaps make dfs.block.size has is meaningful: if I found Job A
runs slowly and the subsequent Job B has same input file with Job A, then
perhaps I can change the dfs.block.size and then the number of mapper task
for Job B would be increased, which may faster the Job running.

If it always needs to copy input file(even that file has been on hdfs
system) everytime a Job runs, then making dfs.block.size take effect on old
files would be not meaningful.

Thanks!
Stan Lee


what's the mechnism to determine the reducer number and reduce progress

2010-05-16 Thread stan lee
When I run the sort job, I found when there are 70 reduce tasks running and
no one completed, the progress bar shows that it has finished about 80%, so
how the mapreduce mechnism to caculate this?

Also,  when I run a job, as we know, we can determine the number of total
reduce tasks through setNumReduceTasks() function, but how to determine the
reducer number(I mean the tasktracker number which run the reduce task)
being used?

Thanks!
Stan. Lee


Re: what's the mechnism to determine the reducer number and reduce progress

2010-05-18 Thread stan lee
Thanks PanFeng, do you have more detailed explanation on this? Is it
caculated by how many reduce files has completed each phase?

Also, what's the answer for my second question? Thanks!

On Mon, May 17, 2010 at 12:44 PM, 原攀峰  wrote:

> For a reduce task, the execution is divided into three phases, each of
> which accounts for 1/3 of the score:
> • The copy phase, when the task fetches map outputs.
> • The sort phase, when map outputs are sorted by key.
> • The reduce phase, when a user-defined function is applied to the list of
> map outputs with each key.
> --
>
> Yuan Panfeng(原攀峰) | BeiHang University
>
> TEL: +86-13426166934
>
> MSN: ypf...@hotmail.com
>
> EMAIL: ypf...@gmail.com
>
> QQ: 362889262
>
>
>
>
> 在2010-05-17 09:44:38,"stan lee"  写道:
>  >When I run the sort job, I found when there are 70 reduce tasks running
> and
> >no one completed, the progress bar shows that it has finished about 80%,
> so
> >how the mapreduce mechnism to caculate this?
> >
> >Also,  when I run a job, as we know, we can determine the number of total
> >reduce tasks through setNumReduceTasks() function, but how to determine
> the
> >reducer number(I mean the tasktracker number which run the reduce task)
> >being used?
> >
> >Thanks!
> >Stan. Lee
>


Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

2010-05-18 Thread stan lee
Hi Guys,

I am trying to use compression to reduce the IO workload when trying to run
a job but failed. I have several questions which needs your help.

For lzo compression, I found a guide
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it said "Note
that you must have both 32-bit and 64-bit liblzo2 installed" ? I am not sure
whether it means that we also need 32bit liblzo2 installed even when we are
on 64bit system. If so, why?

Also if I don't use lzo compression and tried to use gzip to compress the
final reduce output file, I just set below value in mapred-site.xml, but
seems it doesn't work(how can I find the final .gz file compressed? I used
"hadoop dfs -l " and didn't find that.). My question: can we use gzip
to compress the final result when it's not streaming job? How can we ensure
that the compression has been enabled during a job execution?


   mapred.output.compress
   true


Thanks!
Stan Lee


Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

2010-05-19 Thread stan lee
Thanks All. So if we don't call setCompressOutput() and
setOutputCompressorClass() funciton in the sort programe,we  just set
mapred.output.compress to true and set mapred.output.compression.codec to
org.apache.hadoop.io.compress.GzipCodec, that wouldn't have compressed
output file like part-.gz?

On Wed, May 19, 2010 at 1:31 AM, Harsh J  wrote:

> Hi stan,
>
> You can do something of this sort if you use FileOutputFormat, from
> within your Job Driver:
>
>FileOutputFormat.setCompressOutput(job, true);
>FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
>// GzipCodec from org.apache.hadoop.io.compress.
>// and where 'job' is either JobConf or Job object.
>
> This will write the simple file output in Gzip format. You also have
> BZip2Codec.
>
> On Tue, May 18, 2010 at 9:14 PM, stan lee  wrote:
> > Hi Guys,
> >
> > I am trying to use compression to reduce the IO workload when trying to
> run
> > a job but failed. I have several questions which needs your help.
> >
> > For lzo compression, I found a guide
> > http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it said
> "Note
> > that you must have both 32-bit and 64-bit liblzo2 installed" ? I am not
> sure
> > whether it means that we also need 32bit liblzo2 installed even when we
> are
> > on 64bit system. If so, why?
> >
> > Also if I don't use lzo compression and tried to use gzip to compress the
> > final reduce output file, I just set below value in mapred-site.xml, but
> > seems it doesn't work(how can I find the final .gz file compressed? I
> used
> > "hadoop dfs -l " and didn't find that.). My question: can we use
> gzip
> > to compress the final result when it's not streaming job? How can we
> ensure
> > that the compression has been enabled during a job execution?
> >
> > 
> >   mapred.output.compress
> >   true
> > 
> >
> > Thanks!
> > Stan Lee
> >
>
>
>
>  --
> Harsh J
> www.harshj.com
>


Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

2010-05-19 Thread stan lee
Get the meaning now. As sort would use SequenceFileFormat to write the
output file and to use gzip as the compression type, we need to use native
library...would try that.


On Wed, May 19, 2010 at 3:17 PM, stan lee  wrote:

> Thanks All. So if we don't call setCompressOutput() and
> setOutputCompressorClass() funciton in the sort programe,we  just set
> mapred.output.compress to true and set mapred.output.compression.codec to
> org.apache.hadoop.io.compress.GzipCodec, that wouldn't have compressed
> output file like part-.gz?
>
> On Wed, May 19, 2010 at 1:31 AM, Harsh J  wrote:
>
>> Hi stan,
>>
>> You can do something of this sort if you use FileOutputFormat, from
>> within your Job Driver:
>>
>>FileOutputFormat.setCompressOutput(job, true);
>>FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
>>// GzipCodec from org.apache.hadoop.io.compress.
>>// and where 'job' is either JobConf or Job object.
>>
>> This will write the simple file output in Gzip format. You also have
>> BZip2Codec.
>>
>> On Tue, May 18, 2010 at 9:14 PM, stan lee  wrote:
>> > Hi Guys,
>> >
>> > I am trying to use compression to reduce the IO workload when trying to
>> run
>> > a job but failed. I have several questions which needs your help.
>> >
>> > For lzo compression, I found a guide
>> > http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it said
>> "Note
>> > that you must have both 32-bit and 64-bit liblzo2 installed" ? I am not
>> sure
>> > whether it means that we also need 32bit liblzo2 installed even when we
>> are
>> > on 64bit system. If so, why?
>> >
>> > Also if I don't use lzo compression and tried to use gzip to compress
>> the
>> > final reduce output file, I just set below value in mapred-site.xml, but
>> > seems it doesn't work(how can I find the final .gz file compressed? I
>> used
>> > "hadoop dfs -l " and didn't find that.). My question: can we use
>> gzip
>> > to compress the final result when it's not streaming job? How can we
>> ensure
>> > that the compression has been enabled during a job execution?
>> >
>> > 
>> >   mapred.output.compress
>> >   true
>> > 
>> >
>> > Thanks!
>> > Stan Lee
>> >
>>
>>
>>
>>  --
>> Harsh J
>> www.harshj.com
>>
>
>


Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

2010-05-19 Thread stan lee
Thanks Harsh, Hong and Ted. I still have question on how to build native
library for gzip compression type?  I found below information from wiki:
**
In particular the various packages you would need on the target platform
are:

   - C compiler (e.g. GNU C Compiler <http://gcc.gnu.org/>)
   - GNU Autools Chain: autoconf <http://www.gnu.org/software/autoconf/>,
   automake <http://www.gnu.org/software/automake/>,
libtool<http://www.gnu.org/software/libtool/>
   - zlib-development package (stable version >= 1.2.0)
   - lzo-development package (stable version >= 2.0)

Once you have the pre-requisites use the standard build.xml and pass along
the compile.native flag (set to true) to build the native hadoop library:

$ ant -Dcompile.native=true 
 ***
So what's the meaning of development package here? I know for lzo it's there
is hadoop-lzo package..but what's that for gzip? I think gzip is written in
c programe and shouldn't be built using ant/ivy?   Sorry that I am just a
beginner to knock the door of hadoop. Thanks for answer for advance!

I have "make install" GNU gzip source code on my cluster node, would that
work if I directly copy the libraries generated to the dir
$HADOOP_HOME/lib/native/Linux_amd64-64?

Stan.Lee
On Wed, May 19, 2010 at 4:31 PM, stan lee  wrote:

> Get the meaning now. As sort would use SequenceFileFormat to write the
> output file and to use gzip as the compression type, we need to use native
> library...would try that.
>
>
> On Wed, May 19, 2010 at 3:17 PM, stan lee  wrote:
>
>> Thanks All. So if we don't call setCompressOutput() and
>> setOutputCompressorClass() funciton in the sort programe,we  just set
>> mapred.output.compress to true and set mapred.output.compression.codec to
>> org.apache.hadoop.io.compress.GzipCodec, that wouldn't have compressed
>> output file like part-.gz?
>>
>> On Wed, May 19, 2010 at 1:31 AM, Harsh J  wrote:
>>
>>> Hi stan,
>>>
>>> You can do something of this sort if you use FileOutputFormat, from
>>> within your Job Driver:
>>>
>>>FileOutputFormat.setCompressOutput(job, true);
>>>FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
>>>// GzipCodec from org.apache.hadoop.io.compress.
>>>// and where 'job' is either JobConf or Job object.
>>>
>>> This will write the simple file output in Gzip format. You also have
>>> BZip2Codec.
>>>
>>> On Tue, May 18, 2010 at 9:14 PM, stan lee  wrote:
>>> > Hi Guys,
>>> >
>>> > I am trying to use compression to reduce the IO workload when trying to
>>> run
>>> > a job but failed. I have several questions which needs your help.
>>> >
>>> > For lzo compression, I found a guide
>>> > http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it said
>>> "Note
>>> > that you must have both 32-bit and 64-bit liblzo2 installed" ? I am not
>>> sure
>>> > whether it means that we also need 32bit liblzo2 installed even when we
>>> are
>>> > on 64bit system. If so, why?
>>> >
>>> > Also if I don't use lzo compression and tried to use gzip to compress
>>> the
>>> > final reduce output file, I just set below value in mapred-site.xml,
>>> but
>>> > seems it doesn't work(how can I find the final .gz file compressed? I
>>> used
>>> > "hadoop dfs -l " and didn't find that.). My question: can we use
>>> gzip
>>> > to compress the final result when it's not streaming job? How can we
>>> ensure
>>> > that the compression has been enabled during a job execution?
>>> >
>>> > 
>>> >   mapred.output.compress
>>> >   true
>>> > 
>>> >
>>> > Thanks!
>>> > Stan Lee
>>> >
>>>
>>>
>>>
>>>  --
>>> Harsh J
>>> www.harshj.com
>>>
>>
>>
>


No rule to make target `impl/lzo/.deps/LzoDecompressor.Plo' error when trying to build lzo native library

2010-05-24 Thread stan lee
I tried to build lzo native library following below steps:

1. Install
lzo2-2.02-3.el5.rf.x86_64.rpm 和 lzo2-devel-2.02-3.el5.rf.x86_64.rpm

2. download hadoop-lzo and compile it using ant compile-native tar, I met
below error:

 [ exec] checking lzo/lzo2a.h usability... which: no otool in
(/home/xapuser/apache-ant-1.8.1/bin:/opt/sunjdk1.6.0_05/bin/:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/xapuser/bin)
[exec] yes
[exec] checking lzo/lzo2a.h presence... yes
[exec] checking for lzo/lzo2a.h... yes
[exec] checking Checking for the 'actual' dynamic-library for '-llzo2'...
"liblzo2.so.2"
[exec] checking for special C compiler options needed for large files... no
[exec] checking for _FILE_OFFSET_BITS value needed for large files... no
[exec] checking for stdbool.h that conforms to C99... yes
[exec] checking for _Bool... yes
[exec] checking for an ANSI C-conforming const... yes
[exec] checking for off_t... yes
[exec] checking for size_t... yes
[exec] checking whether strerror_r is declared... yes
[exec] checking for strerror_r... yes
[exec] checking whether strerror_r returns char *... yes
[exec] checking for mkdir... yes
[exec] checking for uname... yes
[exec] checking for memset... yes
[exec] checking for JNI_GetCreatedJavaVMs in -ljvm... no
[exec] checking jni.h usability... yes
[exec] checking jni.h presence... yes
[exec] checking for jni.h... yes
[exec] configure: creating ./config.status
[exec] config.status: creating Makefile
[exec] config.status: creating impl/config.h
[exec] config.status: impl/config.h is unchanged
[exec] config.status: executing depfiles commands
[exec] config.status: executing libtool commands
[exec] Makefile:325: impl/lzo/.deps/LzoCompressor.Plo: No such file or
directory
[exec] Makefile:326: impl/lzo/.deps/LzoDecompressor.Plo: No such file or
directory
[exec] make: *** No rule to make target
`impl/lzo/.deps/LzoDecompressor.Plo'. Stop.

BUILD FAILED
/home/xapuser/hadoop-lzo/build.xml:248: exec returned: 2

So have any one met this problem before? Is there anything I missed to build
it successfully?


Re: No rule to make target `impl/lzo/.deps/LzoDecompressor.Plo' error when trying to build lzo native library

2010-05-24 Thread stan lee
This problem has been solved. Thanks!

On Mon, May 24, 2010 at 6:15 PM, stan lee  wrote:

> I tried to build lzo native library following below steps:
>
> 1. Install
> lzo2-2.02-3.el5.rf.x86_64.rpm 和 lzo2-devel-2.02-3.el5.rf.x86_64.rpm
>
> 2. download hadoop-lzo and compile it using ant compile-native tar, I met
> below error:
>
>  [ exec] checking lzo/lzo2a.h usability... which: no otool in
> (/home/xapuser/apache-ant-1.8.1/bin:/opt/sunjdk1.6.0_05/bin/:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/xapuser/bin)
> [exec] yes
> [exec] checking lzo/lzo2a.h presence... yes
> [exec] checking for lzo/lzo2a.h... yes
> [exec] checking Checking for the 'actual' dynamic-library for '-llzo2'...
> "liblzo2.so.2"
> [exec] checking for special C compiler options needed for large files... no
> [exec] checking for _FILE_OFFSET_BITS value needed for large files... no
> [exec] checking for stdbool.h that conforms to C99... yes
> [exec] checking for _Bool... yes
> [exec] checking for an ANSI C-conforming const... yes
> [exec] checking for off_t... yes
> [exec] checking for size_t... yes
> [exec] checking whether strerror_r is declared... yes
> [exec] checking for strerror_r... yes
> [exec] checking whether strerror_r returns char *... yes
> [exec] checking for mkdir... yes
> [exec] checking for uname... yes
> [exec] checking for memset... yes
> [exec] checking for JNI_GetCreatedJavaVMs in -ljvm... no
> [exec] checking jni.h usability... yes
> [exec] checking jni.h presence... yes
> [exec] checking for jni.h... yes
> [exec] configure: creating ./config.status
> [exec] config.status: creating Makefile
> [exec] config.status: creating impl/config.h
> [exec] config.status: impl/config.h is unchanged
> [exec] config.status: executing depfiles commands
> [exec] config.status: executing libtool commands
> [exec] Makefile:325: impl/lzo/.deps/LzoCompressor.Plo: No such file or
> directory
> [exec] Makefile:326: impl/lzo/.deps/LzoDecompressor.Plo: No such file or
> directory
> [exec] make: *** No rule to make target
> `impl/lzo/.deps/LzoDecompressor.Plo'. Stop.
>
> BUILD FAILED
> /home/xapuser/hadoop-lzo/build.xml:248: exec returned: 2
>
> So have any one met this problem before? Is there anything I missed to
> build it successfully?
>


How to create a temporary HDFS file system using Java?

2010-06-09 Thread stan lee
Hi Experts,

Although HDFS file system has exposed some APIs which can be used to create
directory, file, read and write data, can you tell me that whether there is
any API exposed to be used to create an HDFS file system in Java programe?

For example, I have two machines, one is namenode, the other one is
datanode, before using HDFS API to create/delete dir and read/write data
into distributed system, how can I create a tempory HDFS file system under
/home/stan/hdfs using Java? Is there any API or example to do this?

Thanks!
Stan