Re: Problems with HOD and HDFS

2010-06-14 Thread Vinod KV

On Monday 14 June 2010 09:51 AM, David Milne wrote:

Ok, thanks Jeff.

This is pretty surprising though. I would have thought many people
would be in my position, where they have to use Hadoop on a general
purpose cluster, and need it to play nice with a resource manager?
What do other people do in this position, if they don't use HOD?
Deprecated normally means there is a better alternative.

- Dave
   



It isn't formally deprecated though. May be we'll need to do it 
explicitly; that'll help putting up proper documentation about what else 
to use instead.


A quick reply is that you start a static cluster on a set of nodes. 
Static cluster means bringing up hadoop dameons on a set of nodes using 
the startup scripts distributed along in bin/ directory.


That said, there are no changes in HOD in 0.21 and beyond. Deploying 
0.21 clusters should mostly work out of the box. But beyond 0.21, it may 
not work because HOD needs to be updated w.r.t removed/updated hadoop 
specific configuration parameters and environmental variables it 
generates itself.


HTH,
+vinod


On Mon, Jun 14, 2010 at 2:39 PM, Jeff Hammerbacherham...@cloudera.com  wrote:
   

Hey Dave,

I can't speak for the folks at Yahoo!, but from watching the JIRA, I don't
think HOD is actively used or developed anywhere these days. You're
attempting to use a mostly deprecated project, and hence not receiving any
support on the mailing list.

Thanks,
Jeff

On Sun, Jun 13, 2010 at 7:33 PM, David Milned.n.mi...@gmail.com  wrote:

 

Anybody? I am completely stuck here. I have no idea who else I can ask
or where I can go for more information. Is there somewhere specific
where I should be asking about HOD?

Thank you,
Dave

On Thu, Jun 10, 2010 at 2:56 PM, David Milned.n.mi...@gmail.com  wrote:
   

Hi there,

I am trying to get Hadoop on Demand up and running, but am having
problems with the ringmaster not being able to communicate with HDFS.

The output from the hod allocate command ends with this, with full
 

verbosity:
   

[2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve
'hdfs' service address.
[2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id
34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
[2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
[2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from rm.stop()
[2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
cluster /home/dmilne/hadoop/cluster
[2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7


I've attached the hodrc file below, but briefly HOD is supposed to
provision an HDFS cluster as well as a Map/Reduce cluster, and seems
to be failing to do so. The ringmaster log looks like this:

[2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr name:
 

hdfs
   

[2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
service:hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
[2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found
[2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr name:
 

hdfs
   

[2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
service:hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
[2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found

... and so on, until it gives up

Any ideas why? One red flag is that when running the allocate command,
some of the variables echo-ed back look dodgy:

--gridservice-hdfs.fs_port 0
--gridservice-hdfs.host localhost
--gridservice-hdfs.info_port 0

These are not what I specified in the hodrc. Are the port numbers just
set to 0 because I am not using an external HDFS, or is this a
problem?


The software versions involved are:
  - Hadoop 0.20.2
  - Python 2.5.2 (no Twisted)
  - Java 1.6.0_20
  - Torque 2.4.5


The hodrc file looks like this:

[hod]
stream  = True
java-home   = /opt/jdk1.6.0_20
cluster = debian5
cluster-factor  = 1.8
xrs-port-range  = 32768-65536
debug   = 3
allocate-wait-time  = 3600
temp-dir= /scratch/local/dmilne/hod

[ringmaster]
register= True
stream  = False
temp-dir= /scratch/local/dmilne/hod
log-dir = /scratch/local/dmilne/hod/log
http-port-range = 8000-9000
idleness-limit  = 864000
work-dirs   =
/scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2
xrs-port-range  = 32768-65536
debug   = 4

[hodring]
stream  = False
temp-dir= /scratch/local/dmilne/hod
log-dir = /scratch/local/dmilne/hod/log
register= True
java-home  

changing my hadoop log level is not getting reflected in logs

2010-06-14 Thread Gokulakannan M
Hi,

 

I changed the default log level of hadoop from INFO to ERROR by
setting the property hadoop.root.logger to error in
hadoop/conf/log4j.properties

 

But when I start namenode, the INFO logs are seen in the log
file. I did a workaround and found that  HADOOP_ROOT_LOGGER is hard coded to
INFO in 

hadoop-daemon.sh and hadoop script files in hadoop/bin. Is
there anything to do with that or they are provided for any purpose??

 

PS: I am using hadoop 0.20.1

 

 Thanks,

  Gokul

 

  

 


***
This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

 



Re: Appending and seeking files while writing

2010-06-14 Thread Stas Oskin
Hi.

Thanks for clarification.

Append will be supported fully in 0.21.


Any ETA for this version?
Will it work both with Fuse and HDFS API?


 Also, append does *not* add random write. It simply adds the ability to
 re-open a file and add more data to the end.


Just to clarify, even with append it won't be possible to:
1) Pause writing of new file, skip to any position, and update the data.
2) Open existing file, skip to any position and update the data.

This will be even with FUSE.

Is this correct?

Regards.


Re: Appending and seeking files while writing

2010-06-14 Thread Stas Oskin
By the way, what about an ability for node to read file which is being
written by another node?
Or the file must be written and closed completely, before it becomes
available for other nodes?

(AFAIK in 0.18.3 the file appeared as 0 size until it was closed).

Regards.


Re: Problems with HOD and HDFS

2010-06-14 Thread Amr Awadallah
Dave,

  Yes, many others have the same situation, the recommended solution is
either to use the Fair Share Scheduler or the Capacity Scheduler. These
schedulers are much better than HOD since they take data locality into
consideration (they don't just spin up 20 TT nodes on machines that have
nothing to do with your data). They also don't lock down the nodes just for
you, so as TT are freed other jobs can use them immediately (as opposed to
no body can use them till your entire job is done).

  Also, if you are brave and want to try something spanking new, then I
recommend you reach out to the Mesos guys, they have a scheduler layer under
Hadoop that is data locality aware:

http://mesos.berkeley.edu/

-- amr

On Sun, Jun 13, 2010 at 9:21 PM, David Milne d.n.mi...@gmail.com wrote:

 Ok, thanks Jeff.

 This is pretty surprising though. I would have thought many people
 would be in my position, where they have to use Hadoop on a general
 purpose cluster, and need it to play nice with a resource manager?
 What do other people do in this position, if they don't use HOD?
 Deprecated normally means there is a better alternative.

 - Dave

 On Mon, Jun 14, 2010 at 2:39 PM, Jeff Hammerbacher ham...@cloudera.com
 wrote:
  Hey Dave,
 
  I can't speak for the folks at Yahoo!, but from watching the JIRA, I
 don't
  think HOD is actively used or developed anywhere these days. You're
  attempting to use a mostly deprecated project, and hence not receiving
 any
  support on the mailing list.
 
  Thanks,
  Jeff
 
  On Sun, Jun 13, 2010 at 7:33 PM, David Milne d.n.mi...@gmail.com
 wrote:
 
  Anybody? I am completely stuck here. I have no idea who else I can ask
  or where I can go for more information. Is there somewhere specific
  where I should be asking about HOD?
 
  Thank you,
  Dave
 
  On Thu, Jun 10, 2010 at 2:56 PM, David Milne d.n.mi...@gmail.com
 wrote:
   Hi there,
  
   I am trying to get Hadoop on Demand up and running, but am having
   problems with the ringmaster not being able to communicate with HDFS.
  
   The output from the hod allocate command ends with this, with full
  verbosity:
  
   [2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve
   'hdfs' service address.
   [2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id
   34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
   [2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
   [2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from
 rm.stop()
   [2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
   cluster /home/dmilne/hadoop/cluster
   [2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7
  
  
   I've attached the hodrc file below, but briefly HOD is supposed to
   provision an HDFS cluster as well as a Map/Reduce cluster, and seems
   to be failing to do so. The ringmaster log looks like this:
  
   [2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr
 name:
  hdfs
   [2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
   service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
   [2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
   addr hdfs: not found
   [2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr
 name:
  hdfs
   [2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
   service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
   [2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
   addr hdfs: not found
  
   ... and so on, until it gives up
  
   Any ideas why? One red flag is that when running the allocate command,
   some of the variables echo-ed back look dodgy:
  
   --gridservice-hdfs.fs_port 0
   --gridservice-hdfs.host localhost
   --gridservice-hdfs.info_port 0
  
   These are not what I specified in the hodrc. Are the port numbers just
   set to 0 because I am not using an external HDFS, or is this a
   problem?
  
  
   The software versions involved are:
- Hadoop 0.20.2
- Python 2.5.2 (no Twisted)
- Java 1.6.0_20
- Torque 2.4.5
  
  
   The hodrc file looks like this:
  
   [hod]
   stream  = True
   java-home   = /opt/jdk1.6.0_20
   cluster = debian5
   cluster-factor  = 1.8
   xrs-port-range  = 32768-65536
   debug   = 3
   allocate-wait-time  = 3600
   temp-dir= /scratch/local/dmilne/hod
  
   [ringmaster]
   register= True
   stream  = False
   temp-dir= /scratch/local/dmilne/hod
   log-dir = /scratch/local/dmilne/hod/log
   http-port-range = 8000-9000
   idleness-limit  = 864000
   work-dirs   =
   /scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2
   xrs-port-range  = 32768-65536
   debug  

No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-14 Thread Kevin Tse
Hi,
I am upgrading my code from hadoop-0.19.2 to hadoop-0.20.2, during the
process I found that there was no KeyValueTextInputFormat class which exists
in hadoop-0.19.2. It's so strange that this version of hadoop does not come
with this commonly used InputFormat. I have taken a look at the
SecondarySort.java example code, it uses TextInputFormat and
StringTokenizer to split each line, it is ok but kinda awkward to me.

Do I have to implement a new InputFormat myself or there's
a KeyValueTextInputFormat that exists somewhere I didn't notice?

Thank you.
Kevin Tse


Re: No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-14 Thread Ted Yu
Have you checked
src/mapred/org/apache/hadoop/mapred/KeyValueTextInputFormat.java ?

On Mon, Jun 14, 2010 at 6:51 AM, Kevin Tse kevintse.on...@gmail.com wrote:

 Hi,
 I am upgrading my code from hadoop-0.19.2 to hadoop-0.20.2, during the
 process I found that there was no KeyValueTextInputFormat class which
 exists
 in hadoop-0.19.2. It's so strange that this version of hadoop does not come
 with this commonly used InputFormat. I have taken a look at the
 SecondarySort.java example code, it uses TextInputFormat and
 StringTokenizer to split each line, it is ok but kinda awkward to me.

 Do I have to implement a new InputFormat myself or there's
 a KeyValueTextInputFormat that exists somewhere I didn't notice?

 Thank you.
 Kevin Tse



Re: No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-14 Thread Kevin Tse
Hi Ted,
I mean the new API:
org.apache.hadoop.mapreduce.Job.setInputFormatClass(org.apache.hadoop.mapreduce.InputFormat)

Job.setInputFormatClass() only accepts
org.apache.hadoop.mapreduce.InputFormat(of which there are several
subclasses, while KeyValueTextInputFormat is not one of them) as its
parameter.

On Mon, Jun 14, 2010 at 10:03 PM, Ted Yu yuzhih...@gmail.com wrote:

 Have you checked
 src/mapred/org/apache/hadoop/mapred/KeyValueTextInputFormat.java ?

 On Mon, Jun 14, 2010 at 6:51 AM, Kevin Tse kevintse.on...@gmail.com
 wrote:

  Hi,
  I am upgrading my code from hadoop-0.19.2 to hadoop-0.20.2, during the
  process I found that there was no KeyValueTextInputFormat class which
  exists
  in hadoop-0.19.2. It's so strange that this version of hadoop does not
 come
  with this commonly used InputFormat. I have taken a look at the
  SecondarySort.java example code, it uses TextInputFormat and
  StringTokenizer to split each line, it is ok but kinda awkward to me.
 
  Do I have to implement a new InputFormat myself or there's
  a KeyValueTextInputFormat that exists somewhere I didn't notice?
 
  Thank you.
  Kevin Tse
 



Re: Caching in HDFS C API Client

2010-06-14 Thread Owen O'Malley
Indeed. On the terasort benchmark, I had to run intermediate jobs that
were larger than ram on the cluster to ensure that the data was not
coming from the file cache.

-- Owen


Re: Caching in HDFS C API Client

2010-06-14 Thread Brian Bockelman
Hey Owen, all,

I find this one handy if you have root access:

http://linux-mm.org/Drop_Caches

echo 3  /proc/sys/vm/drop_caches

Drops the pagecache, dentries, and inodes.  Without this, you can still get 
caching effects doing the normal read and write large files if the linux 
pagecache outsmarts you (and I don't know about you, but it often outsmarts 
me...).

Brian

On Jun 14, 2010, at 9:35 AM, Owen O'Malley wrote:

 Indeed. On the terasort benchmark, I had to run intermediate jobs that
 were larger than ram on the cluster to ensure that the data was not
 coming from the file cache.
 
 -- Owen



smime.p7s
Description: S/MIME cryptographic signature


Re: Problems with HOD and HDFS

2010-06-14 Thread Edward Capriolo
On Mon, Jun 14, 2010 at 8:37 AM, Amr Awadallah a...@cloudera.com wrote:

 Dave,

  Yes, many others have the same situation, the recommended solution is
 either to use the Fair Share Scheduler or the Capacity Scheduler. These
 schedulers are much better than HOD since they take data locality into
 consideration (they don't just spin up 20 TT nodes on machines that have
 nothing to do with your data). They also don't lock down the nodes just for
 you, so as TT are freed other jobs can use them immediately (as opposed to
 no body can use them till your entire job is done).

  Also, if you are brave and want to try something spanking new, then I
 recommend you reach out to the Mesos guys, they have a scheduler layer
 under
 Hadoop that is data locality aware:

 http://mesos.berkeley.edu/

 -- amr

 On Sun, Jun 13, 2010 at 9:21 PM, David Milne d.n.mi...@gmail.com wrote:

  Ok, thanks Jeff.
 
  This is pretty surprising though. I would have thought many people
  would be in my position, where they have to use Hadoop on a general
  purpose cluster, and need it to play nice with a resource manager?
  What do other people do in this position, if they don't use HOD?
  Deprecated normally means there is a better alternative.
 
  - Dave
 
  On Mon, Jun 14, 2010 at 2:39 PM, Jeff Hammerbacher ham...@cloudera.com
  wrote:
   Hey Dave,
  
   I can't speak for the folks at Yahoo!, but from watching the JIRA, I
  don't
   think HOD is actively used or developed anywhere these days. You're
   attempting to use a mostly deprecated project, and hence not receiving
  any
   support on the mailing list.
  
   Thanks,
   Jeff
  
   On Sun, Jun 13, 2010 at 7:33 PM, David Milne d.n.mi...@gmail.com
  wrote:
  
   Anybody? I am completely stuck here. I have no idea who else I can ask
   or where I can go for more information. Is there somewhere specific
   where I should be asking about HOD?
  
   Thank you,
   Dave
  
   On Thu, Jun 10, 2010 at 2:56 PM, David Milne d.n.mi...@gmail.com
  wrote:
Hi there,
   
I am trying to get Hadoop on Demand up and running, but am having
problems with the ringmaster not being able to communicate with
 HDFS.
   
The output from the hod allocate command ends with this, with full
   verbosity:
   
[2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to
 retrieve
'hdfs' service address.
[2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster
 id
34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
[2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
[2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from
  rm.stop()
[2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
cluster /home/dmilne/hadoop/cluster
[2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7
   
   
I've attached the hodrc file below, but briefly HOD is supposed to
provision an HDFS cluster as well as a Map/Reduce cluster, and seems
to be failing to do so. The ringmaster log looks like this:
   
[2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr
  name:
   hdfs
[2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
[2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found
[2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr
  name:
   hdfs
[2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
[2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found
   
... and so on, until it gives up
   
Any ideas why? One red flag is that when running the allocate
 command,
some of the variables echo-ed back look dodgy:
   
--gridservice-hdfs.fs_port 0
--gridservice-hdfs.host localhost
--gridservice-hdfs.info_port 0
   
These are not what I specified in the hodrc. Are the port numbers
 just
set to 0 because I am not using an external HDFS, or is this a
problem?
   
   
The software versions involved are:
 - Hadoop 0.20.2
 - Python 2.5.2 (no Twisted)
 - Java 1.6.0_20
 - Torque 2.4.5
   
   
The hodrc file looks like this:
   
[hod]
stream  = True
java-home   = /opt/jdk1.6.0_20
cluster = debian5
cluster-factor  = 1.8
xrs-port-range  = 32768-65536
debug   = 3
allocate-wait-time  = 3600
temp-dir= /scratch/local/dmilne/hod
   
[ringmaster]
register= True
stream  = False
temp-dir= /scratch/local/dmilne/hod
log-dir = /scratch/local/dmilne/hod/log
http-port-range = 

Re: Appending and seeking files while writing

2010-06-14 Thread Todd Lipcon
On Mon, Jun 14, 2010 at 4:00 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 Thanks for clarification.

 Append will be supported fully in 0.21.
 
 
 Any ETA for this version?


Should be out soon - Tom White is working hard on the release. Note that the
first release, 0.21.0, will be somewhat of a development quality release
not recommended for production use. Of course, the way it will become
production-worthy is by less risk-averse people trying it and finding the
bugs :)


 Will it work both with Fuse and HDFS API?

 I don't know that the Fuse code has been updated to call append. My guess
is that a small patch would be required.



  Also, append does *not* add random write. It simply adds the ability to
  re-open a file and add more data to the end.
 
 
 Just to clarify, even with append it won't be possible to:
 1) Pause writing of new file, skip to any position, and update the data.
 2) Open existing file, skip to any position and update the data.

 Correct, neither of those are allowed.


 This will be even with FUSE.

 Is this correct?

 Regards.




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Appending and seeking files while writing

2010-06-14 Thread Todd Lipcon
On Mon, Jun 14, 2010 at 4:28 AM, Stas Oskin stas.os...@gmail.com wrote:

 By the way, what about an ability for node to read file which is being
 written by another node?


This is allowed, though there are some remaining bugs to be ironed out here.
See https://issues.apache.org/jira/browse/HDFS-1057 for example.


 Or the file must be written and closed completely, before it becomes
 available for other nodes?

 (AFAIK in 0.18.3 the file appeared as 0 size until it was closed).

 Regards.




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Problems with HOD and HDFS

2010-06-14 Thread Steve Loughran

Edward Capriolo wrote:



I have not used it much, but I think HOD is pretty cool. I guess most people
who are looking to (spin up, run job ,transfer off, spin down) are using
EC2. HOD does something like make private hadoop clouds on your hardware and
many probably do not have that use case. As schedulers advance and get
better HOD becomes less attractive, but I can always see a place for it.


I don't know who is using it, or maintaining it; we've been bringing up 
short-lived Hadoop clusters different.


I think I should write a little article on the topic; I presented about 
it at Berlin Buzzwords last week.


Short lived Hadoop clusters on VMs are fine if you don't have enough 
data or CPU load to justify a set of dedicated physical machines, and is 
a good way of experimenting with Hadoop at scale. You can maybe lock 
down the network better too, though that depends on your VM infrastructure.


Where VMs are weak is in disk IO performance, but there's no reason why 
the VM infrastructure can't take a list of filenames/directories as a 
hint for VM placement (placement is the new scheduling, incidentally), 
and virtualized IO can only improve. If you can run Hadoop MapReduce 
directly against SAN-mounted storage then you can stop worrying about 
locality of data and still gain from parallelisation of the operations.



-steve




Re: Appending and seeking files while writing

2010-06-14 Thread Stas Oskin
Hi.

Should be out soon - Tom White is working hard on the release. Note that the
 first release, 0.21.0, will be somewhat of a development quality release
 not recommended for production use. Of course, the way it will become
 production-worthy is by less risk-averse people trying it and finding the
 bugs :)


  Will it work both with Fuse and HDFS API?
 
  I don't know that the Fuse code has been updated to call append. My guess
 is that a small patch would be required.


 
   Also, append does *not* add random write. It simply adds the ability to
   re-open a file and add more data to the end.
  
  
  Just to clarify, even with append it won't be possible to:
  1) Pause writing of new file, skip to any position, and update the data.
  2) Open existing file, skip to any position and update the data.
 
  Correct, neither of those are allowed.


Thanks for clarification.


job execution

2010-06-14 Thread Gang Luo
Hi,
According to the doc, JobControl can maintain the dependency among different 
jobs and only jobs without dependency can execute. How does JobControl maintain 
the dependency and how can we indicate the dependency?

Thanks,
-Gang






Re: job execution

2010-06-14 Thread Akash Deep Shakya
Use ControlledJob class from Hadoop trunk. And run it through JobControl.

Regards
Akash Deep Shakya OpenAK
FOSS Nepal Community
akashakya at gmail dot com

~ Failure to prepare is preparing to fail ~



On Mon, Jun 14, 2010 at 10:40 PM, Gang Luo lgpub...@yahoo.com.cn wrote:

 Hi,
 According to the doc, JobControl can maintain the dependency among
 different jobs and only jobs without dependency can execute. How does
 JobControl maintain the dependency and how can we indicate the dependency?

 Thanks,
 -Gang







Task process exit with nonzero status of 1 - deleting userlogs helps

2010-06-14 Thread Johannes Zillmann
Hi,

i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into a 
situation where every task scheduled on 2 of the 4 nodes failed. 
Seems like the child jvm crashes. There are no child logs under logs/userlogs. 
Tasktracker gives this:

2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner 
constructed JVM ID: jvm_201006091425_0049_m_-946174604
2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner 
jvm_201006091425_0049_m_-946174604 spawned.
2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM : 
jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner: 
attempt_201006091425_0049_m_003179_0 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)


At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new job 
created the logs/userlogs again and no error ocuured anymore on this host.
The permissions of userlogs and userlogsOLD are exactly the same. userlogsOLD 
contains about 378M in 132747 files. When copying the content of userlogsOLD 
into userlogs, the tasks of the belonging node starts failing again.

Some questions:
- this seems to me like a problem with too many files in one folder - any 
thoughts on this ?
- is the content of logs/userlogs cleaned up by hadoop regularly ?
- the logs/stdout file of the tasks are not existent, the logs/out fiels of the 
tasktracker hasn't any specific message (other then message posted above) - is 
there any log file left where an error message could be found ?


best regards
Johannes

Re: Task process exit with nonzero status of 1 - deleting userlogs helps

2010-06-14 Thread Edward Capriolo
On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann jzillm...@googlemail.com
 wrote:

 Hi,

 i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into
 a situation where every task scheduled on 2 of the 4 nodes failed.
 Seems like the child jvm crashes. There are no child logs under
 logs/userlogs. Tasktracker gives this:

 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
 JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
 Runner jvm_201006091425_0049_m_-946174604 spawned.
 2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
 jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
 2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
 attempt_201006091425_0049_m_003179_0 Child Error
 java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)


 At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new job
 created the logs/userlogs again and no error ocuured anymore on this host.
 The permissions of userlogs and userlogsOLD are exactly the same.
 userlogsOLD contains about 378M in 132747 files. When copying the content of
 userlogsOLD into userlogs, the tasks of the belonging node starts failing
 again.

 Some questions:
 - this seems to me like a problem with too many files in one folder - any
 thoughts on this ?
 - is the content of logs/userlogs cleaned up by hadoop regularly ?
 - the logs/stdout file of the tasks are not existent, the logs/out fiels of
 the tasktracker hasn't any specific message (other then message posted
 above) - is there any log file left where an error message could be found ?


 best regards
 Johannes


Most file systems have an upper limit on number of subfiles/folders in a
folder. You have probably hit the EXT3 limit. If you launch lots and lots of
jobs you can hit the limit before any cleanup happens.

You can experiment with cleanup and other filesystems. The following log
related issue might be relevant.

https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877614#action_12877614

Regards,
Edward


Hadoop and IP on InfiniBand (IPoIB)

2010-06-14 Thread Russell Brown
I'm a new user of Hadoop.  I have a Linux cluster with both gigabit 
ethernet and InfiniBand communications interfaces.  Could someone please 
tell me how to switch IP communication from ethernet (the default) to 
InfiniBand?  Thanks.


--


Russell A. Brown|  Oracle
russ.br...@oracle.com   |  UMPK14-260
(650) 786-3011 (office) |  14 Network Circle
(650) 786-3453 (fax)|  Menlo Park, CA 94025





Re: Caching in HDFS C API Client

2010-06-14 Thread Arun C Murthy

Nice, thanks Brian!

On Jun 14, 2010, at 7:39 AM, Brian Bockelman wrote:


Hey Owen, all,

I find this one handy if you have root access:

http://linux-mm.org/Drop_Caches

echo 3  /proc/sys/vm/drop_caches

Drops the pagecache, dentries, and inodes.  Without this, you can  
still get caching effects doing the normal read and write large  
files if the linux pagecache outsmarts you (and I don't know about  
you, but it often outsmarts me...).


Brian

On Jun 14, 2010, at 9:35 AM, Owen O'Malley wrote:

Indeed. On the terasort benchmark, I had to run intermediate jobs  
that

were larger than ram on the cluster to ensure that the data was not
coming from the file cache.

-- Owen






Re: Problems with HOD and HDFS

2010-06-14 Thread David Milne
Thanks everyone for your replies.

Even though HOD looks like a dead-end I would prefer to use it. I am
just one user of the cluster among many, and currently the only one
using Hadoop. The jobs I need to run are pretty much one-off: they are
big jobs that I can't do without Hadoop, but I might need to run them
once a month or less. The ability to provision MapReduce and HDFS when
I need it sounds ideal.

Following Vinod's advice, I have rolled back to Hadoop 0.20.1 (the
last version that HOD kept up with) and taken a closer look at the
ringmaster logs. However, I am still getting the same problems as
before, and I can't find anything in the logs to help me identify the
NameNode.

The full ringmaster log is below. It's a pretty repetitive song, so
I've identified the chorus.

[2010-06-15 10:07:40,236] DEBUG/10 ringMaster:569 - Getting service ID.
[2010-06-15 10:07:40,237] DEBUG/10 ringMaster:573 - Got service ID:
34350.symphony.cs.waikato.ac.nz
[2010-06-15 10:07:40,239] DEBUG/10 ringMaster:756 - Command to
execute: /bin/cp /home/dmilne/hadoop/hadoop-0.20.1.tar.gz
/scratch/local/dmilne/hod/dmilne.34350.symphony.cs.waikato.ac.nz.ringmaster
[2010-06-15 10:07:42,314] DEBUG/10 ringMaster:762 - Completed command
execution. Exit Code: 0.
[2010-06-15 10:07:42,315] DEBUG/10 ringMaster:591 - Service registry @
http://symphony.cs.waikato.ac.nz:36372
[2010-06-15 10:07:47,503] DEBUG/10 ringMaster:726 - tarball name :
/scratch/local/dmilne/hod/dmilne.34350.symphony.cs.waikato.ac.nz.ringmaster/hadoop-0.20.1.tar.gz
hadoop package name : hadoop-0.20.1/
[2010-06-15 10:07:47,505] DEBUG/10 ringMaster:716 - Returning Hadoop
directory as: 
/scratch/local/dmilne/hod/dmilne.34350.symphony.cs.waikato.ac.nz.ringmaster/hadoop-0.20.1/
[2010-06-15 10:07:47,515] DEBUG/10 util:215 - Executing command
/scratch/local/dmilne/hod/dmilne.34350.symphony.cs.waikato.ac.nz.ringmaster/hadoop-0.20.1/bin/hadoop
version to find hadoop version
[2010-06-15 10:07:48,241] DEBUG/10 util:224 - Version from hadoop
command: Hadoop 0.20.1

[2010-06-15 10:07:48,244] DEBUG/10 ringMaster:117 - Using max-connect value 30
[2010-06-15 10:07:48,246] INFO/20 ringMaster:61 - Twisted interface
not found. Using hodXMLRPCServer.
[2010-06-15 10:07:48,257] DEBUG/10 ringMaster:73 - Ringmaster RPC
Server at 33771
[2010-06-15 10:07:48,265] DEBUG/10 ringMaster:121 - registering:
http://cn71:8030/hadoop-0.20.1.tar.gz
[2010-06-15 10:07:48,275] DEBUG/10 ringMaster:658 - dmilne
34350.symphony.cs.waikato.ac.nz cn71.symphony.cs.waikato.ac.nz
ringmaster hod
[2010-06-15 10:07:48,307] DEBUG/10 ringMaster:670 - Registered with
serivce registry: http://symphony.cs.waikato.ac.nz:36372.

//chorus start
[2010-06-15 10:07:48,393] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs
[2010-06-15 10:07:48,394] DEBUG/10 ringMaster:487 - getServiceAddr
service: hodlib.GridServices.hdfs.Hdfs instance at 0xc9e050
[2010-06-15 10:07:48,395] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found
//chorus end

//chorus (3x)

[2010-06-15 10:07:51,461] DEBUG/10 ringMaster:726 - tarball name :
/scratch/local/dmilne/hod/dmilne.34350.symphony.cs.waikato.ac.nz.ringmaster/hadoop-0.20.1.tar.gz
hadoop package name : hadoop-0.20.1/
[2010-06-15 10:07:51,463] DEBUG/10 ringMaster:716 - Returning Hadoop
directory as: 
/scratch/local/dmilne/hod/dmilne.34350.symphony.cs.waikato.ac.nz.ringmaster/hadoop-0.20.1/
[2010-06-15 10:07:51,465] DEBUG/10 ringMaster:690 -
hadoopdir=/scratch/local/dmilne/hod/dmilne.34350.symphony.cs.waikato.ac.nz.ringmaster/hadoop-0.20.1/,
java-home=/opt/jdk1.6.0_20
[2010-06-15 10:07:51,470] DEBUG/10 util:215 - Executing command
/scratch/local/dmilne/hod/dmilne.34350.symphony.cs.waikato.ac.nz.ringmaster/hadoop-0.20.1/bin/hadoop
version to find hadoop version

//chorus (1x)

[2010-06-15 10:07:52,448] DEBUG/10 util:224 - Version from hadoop
command: Hadoop 0.20.1
[2010-06-15 10:07:52,450] DEBUG/10 ringMaster:697 - starting jt monitor
[2010-06-15 10:07:52,453] DEBUG/10 ringMaster:913 - Entered start method.
[2010-06-15 10:07:52,455] DEBUG/10 ringMaster:924 -
/home/dmilne/hadoop/hadoop-0.20.1/contrib/hod/bin/hodring
--hodring.tarball-retry-initial-time 1.0
--hodring.cmd-retry-initial-time 2.0 --hodring.cmd-retry-interval 2.0
--hodring.service-id 34350.symphony.cs.waikato.ac.nz
--hodring.temp-dir /scratch/local/dmilne/hod --hodring.http-port-range
8000-9000 --hodring.userid dmilne --hodring.java-home /opt/jdk1.6.0_20
--hodring.svcrgy-addr symphony.cs.waikato.ac.nz:36372
--hodring.download-addr h:t --hodring.tarball-retry-interval 3.0
--hodring.log-dir /scratch/local/dmilne/hod/log
--hodring.mapred-system-dir-root /mapredsystem
--hodring.xrs-port-range 32768-65536 --hodring.debug 4
--hodring.ringmaster-xrs-addr cn71:33771 --hodring.register
[2010-06-15 10:07:52,456] DEBUG/10 ringMaster:479 - getServiceAddr name: mapred
[2010-06-15 10:07:52,458] DEBUG/10 ringMaster:487 - getServiceAddr
service: hodlib.GridServices.mapred.MapReduce instance at 0xc9e098
[2010-06-15 10:07:52,460] DEBUG/10 

Re: Problems with HOD and HDFS

2010-06-14 Thread David Milne
Is there something else I could read about setting up short-lived
Hadoop clusters on virtual machines? I have no experience with VMs at
all. I see there is quite a bit of material about using them to get
Hadoop up and running with a psuedo-cluster on a single machine, but I
don't follow how this stretches out to using multiple machines
allocated by Torque.

Thanks,
Dave

On Tue, Jun 15, 2010 at 3:49 AM, Steve Loughran ste...@apache.org wrote:
 Edward Capriolo wrote:


 I have not used it much, but I think HOD is pretty cool. I guess most
 people
 who are looking to (spin up, run job ,transfer off, spin down) are using
 EC2. HOD does something like make private hadoop clouds on your hardware
 and
 many probably do not have that use case. As schedulers advance and get
 better HOD becomes less attractive, but I can always see a place for it.

 I don't know who is using it, or maintaining it; we've been bringing up
 short-lived Hadoop clusters different.

 I think I should write a little article on the topic; I presented about it
 at Berlin Buzzwords last week.

 Short lived Hadoop clusters on VMs are fine if you don't have enough data or
 CPU load to justify a set of dedicated physical machines, and is a good way
 of experimenting with Hadoop at scale. You can maybe lock down the network
 better too, though that depends on your VM infrastructure.

 Where VMs are weak is in disk IO performance, but there's no reason why the
 VM infrastructure can't take a list of filenames/directories as a hint for
 VM placement (placement is the new scheduling, incidentally), and
 virtualized IO can only improve. If you can run Hadoop MapReduce directly
 against SAN-mounted storage then you can stop worrying about locality of
 data and still gain from parallelisation of the operations.


 -steve





How do I configure 'Skipbadrecords' in Hadoop Streaming?

2010-06-14 Thread edward choi
HI,

I am trying to use hadoop streaming and there seems to be a few bad records
in my data.

I'd like to use Skipbadrecords but I can't find how to use it in hadoop
streaming.

Is it at all possible?

Thanks in advance.


Re: job execution

2010-06-14 Thread Jeff Zhang
There's a class org.apache.hadoop.mapred.jobcontrol.Job which is a
wapper of JobConf. And You and dependent jobs to it. Then put it to
JobControl.




On Mon, Jun 14, 2010 at 9:55 AM, Gang Luo lgpub...@yahoo.com.cn wrote:
 Hi,
 According to the doc, JobControl can maintain the dependency among different 
 jobs and only jobs without dependency can execute. How does JobControl 
 maintain the dependency and how can we indicate the dependency?

 Thanks,
 -Gang








-- 
Best Regards

Jeff Zhang


CFP for Surge Scalability Conference 2010

2010-06-14 Thread Jason Dixon
We're excited to announce Surge, the Scalability and Performance
Conference, to be held in Baltimore on Sept 30 and Oct 1, 2010.  The
event focuses on case studies that demonstrate successes (and failures)
in Web applications and Internet architectures.

Our Keynote speakers include John Allspaw and Theo Schlossnagle.  We are
currently accepting submissions for the Call For Papers through July
9th.  You can find more information, including our current list of
speakers, online:

http://omniti.com/surge/2010

If you've been to Velocity, or wanted to but couldn't afford it, then
Surge is just what you've been waiting for.  For more information,
including CFP, sponsorship of the event, or participating as an
exhibitor, please contact us at su...@omniti.com.

Thanks,

-- 
Jason Dixon
OmniTI Computer Consulting, Inc.
jdi...@omniti.com
443.325.1357 x.241


setting up hadoop 0.20.1 development environment

2010-06-14 Thread Vidur Goyal
Hi,

I am trying to set up a development cluster for hadoop 0.20.1 in eclipse.
I used this url
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/ to
check out the build. I compiled compile , compile-core-test , and
eclipse-files using ant. Then when I build the project , I am getting
errors in bin/benchmarks directory. I have followed the screencast from
cloudera
http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/.

Thanks,
Vidur

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



running elephant-bird in eclipse codec property

2010-06-14 Thread Kim Vogt
Hi peeps,

I'm trying to run elephant-bird code in eclipse, specifically (
http://github.com/kevinweil/elephant-bird/blob/master/examples/src/pig/json_word_count.pig),
but I'm not sure how to set the core-site.xml properties via eclipse.  I
tried adding them to VM args but am still getting the following error:

10/06/14 21:23:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
10/06/14 21:23:34 WARN mapred.JobClient: No job jar file set.  User classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
10/06/14 21:23:34 INFO input.FileInputFormat: Total input paths to process :
2
10/06/14 21:23:34 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
10/06/14 21:23:34 INFO lzo.LzoCodec: Successfully loaded  initialized
native-lzo library [hadoop-lzo rev 916aeae88ceb6734a679ebf9b48a93bea4cd9a06]
10/06/14 21:23:34 INFO input.LzoInputFormat: Added LZO split for
file:/home/kim/code/data/jsonData/json.txt.lzo[start=0, length=100]
10/06/14 21:23:34 INFO mapred.JobClient: Running job: job_local_0001
10/06/14 21:23:34 INFO input.FileInputFormat: Total input paths to process :
2
10/06/14 21:23:34 INFO input.LzoInputFormat: Added LZO split for
file:/home/kim/code/data/jsonData/json.txt.lzo[start=0, length=100]
10/06/14 21:23:34 INFO mapred.MapTask: io.sort.mb = 100
10/06/14 21:23:34 INFO mapred.MapTask: data buffer = 79691776/99614720
10/06/14 21:23:34 INFO mapred.MapTask: record buffer = 262144/327680
10/06/14 21:23:34 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: No codec for file
file:/home/kim/code/data/jsonData/json.txt.lzo not found, cannot run
at
com.twitter.elephantbird.mapreduce.input.LzoRecordReader.initialize(LzoRecordReader.java:64)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:582)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
10/06/14 21:23:35 INFO mapred.JobClient:  map 0% reduce 0%
10/06/14 21:23:35 INFO mapred.JobClient: Job complete: job_local_0001
10/06/14 21:23:35 INFO mapred.JobClient: Counters: 0

Help appreciated :-)

Thanks!

-Kim


Re: job execution

2010-06-14 Thread Akash Deep Shakya
@Jeff, I think JobConf is already deprecated
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob;
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl; can be used instead.

Regards
Akash Deep Shakya OpenAK
FOSS Nepal Community
akashakya at gmail dot com

~ Failure to prepare is preparing to fail ~



On Tue, Jun 15, 2010 at 7:28 AM, Jeff Zhang zjf...@gmail.com wrote:

 There's a class org.apache.hadoop.mapred.jobcontrol.Job which is a
 wapper of JobConf. And You and dependent jobs to it. Then put it to
 JobControl.




 On Mon, Jun 14, 2010 at 9:55 AM, Gang Luo lgpub...@yahoo.com.cn wrote:
  Hi,
  According to the doc, JobControl can maintain the dependency among
 different jobs and only jobs without dependency can execute. How does
 JobControl maintain the dependency and how can we indicate the dependency?
 
  Thanks,
  -Gang
 
 
 
 
 



 --
 Best Regards

 Jeff Zhang