Hadoop Engineers ...

2009-05-13 Thread Htin Hlaing
Hello - Attributor is looking for talented backend/infrastructure senior
software engineers with Hadoop skills.  Contact me if you are interested.
I will provide more on the job description.

 

Thanks,

Htin Hlaing

Attributor 



RE: HDFS error

2008-10-10 Thread Htin Hlaing
Thanks for you help Samuel.  I was having problem in both writing and
reading.  Did run the fsck and removed some damaged files and restarted
the dfs.  Seems to be OK now.  Not exactly sure what happened though.

Thanks,
Htin

-Original Message-
From: Samuel Guo [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 09, 2008 6:15 PM
To: core-user@hadoop.apache.org
Subject: Re: HDFS error

Does this happen when you want to write some files to HDFS?
if it is so, plz check that you have enough space in the disks of your
datanode.

if this happened when you want to read some files in HDFS, maybe you can
run
fsck to check if the file is healthy.

hope it will be helpful.

On Fri, Oct 10, 2008 at 8:21 AM, Htin Hlaing [EMAIL PROTECTED]
wrote:

 Hello -  I am experiencing the following HDFS problem across the
clusters
 sharing the DFS.  It's not specific to this particular data node ip
 address but the exception is across all other data nodes as well.  Any
 help is appreciated.

 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Exception
in
 createBlockOutputStream java.io.IOException: Bad connect ack with
 firstBadLink 10.50.80.108:50010
 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Abandoning
 block blk_2383100013215057496
 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Waiting to
 find target node: 10.50.80.112:50010
 2008-10-09 14:14:16,604 INFO org.apache.hadoop.fs.DFSClient: Could not
 obtain block blk_3359685166656187008 from any node:
java.io.IOException:
 No live nodes contain current block
 2008-10-09 14:14:54,370 INFO org.apache.hadoop.fs.DFSClient: Could not
 obtain block blk_-4901580690304720524 from any node:
java.io.IOException:
 No live nodes contain current block
 2008-10-09 14:17:19,619 INFO org.apache.hadoop.fs.DFSClient: Could not
 obtain block blk_3359685166656187008 from any node:
java.io.IOException:
 No live nodes contain current block
 2008-10-09 14:17:57,385 INFO org.apache.hadoop.fs.DFSClient: Could not
 obtain block blk_-4901580690304720524 from any node:
java.io.IOException:
 No live nodes contain current block
 2008-10-09 14:20:25,634 INFO org.apache.hadoop.fs.DFSClient: Could not
 obtain block blk_3359685166656187008 from any node:
java.io.IOException:
 No live nodes contain current block
 2008-10-09 14:21:09,401 INFO org.apache.hadoop.fs.DFSClient: Could not
 obtain block blk_-4901580690304720524 from any node:
java.io.IOException:
 No live nodes contain current block
 2008-10-09 14:23:28,649 INFO org.apache.hadoop.fs.DFSClient: Could not
 obtain block blk_3359685166656187008 from any node:
java.io.IOException:
 No live nodes contain current block

 Is there a knowledge base that I can search in the old posts to the
 mailing list?

 Thanks,
 Htin



HDFS error

2008-10-09 Thread Htin Hlaing
Hello -  I am experiencing the following HDFS problem across the clusters
sharing the DFS.  It's not specific to this particular data node ip
address but the exception is across all other data nodes as well.  Any
help is appreciated.

2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 10.50.80.108:50010
2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Abandoning
block blk_2383100013215057496
2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Waiting to
find target node: 10.50.80.112:50010
2008-10-09 14:14:16,604 INFO org.apache.hadoop.fs.DFSClient: Could not
obtain block blk_3359685166656187008 from any node:  java.io.IOException:
No live nodes contain current block
2008-10-09 14:14:54,370 INFO org.apache.hadoop.fs.DFSClient: Could not
obtain block blk_-4901580690304720524 from any node:  java.io.IOException:
No live nodes contain current block
2008-10-09 14:17:19,619 INFO org.apache.hadoop.fs.DFSClient: Could not
obtain block blk_3359685166656187008 from any node:  java.io.IOException:
No live nodes contain current block
2008-10-09 14:17:57,385 INFO org.apache.hadoop.fs.DFSClient: Could not
obtain block blk_-4901580690304720524 from any node:  java.io.IOException:
No live nodes contain current block
2008-10-09 14:20:25,634 INFO org.apache.hadoop.fs.DFSClient: Could not
obtain block blk_3359685166656187008 from any node:  java.io.IOException:
No live nodes contain current block
2008-10-09 14:21:09,401 INFO org.apache.hadoop.fs.DFSClient: Could not
obtain block blk_-4901580690304720524 from any node:  java.io.IOException:
No live nodes contain current block
2008-10-09 14:23:28,649 INFO org.apache.hadoop.fs.DFSClient: Could not
obtain block blk_3359685166656187008 from any node:  java.io.IOException:
No live nodes contain current block 

Is there a knowledge base that I can search in the old posts to the
mailing list?

Thanks,
Htin


RE: How to input a hdfs file inside a mapper?

2008-09-26 Thread Htin Hlaing
I would imagine something like:

FSDataInputStream inFileStream = dfsFileSystem.open(dfsFilePath);

Don't forget to close after.

Thanks,
Htin

-Original Message-
From: Amit_Gupta [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 26, 2008 5:47 AM
To: core-user@hadoop.apache.org
Subject: How to input a hdfs file inside a mapper?


How can I get an Input stream on a file stored in HDFS inside a mapper or
a
reducer?

thanks 

Amit
-- 
View this message in context:
http://www.nabble.com/How-to-input-a-hdfs-file-inside-a-mapper--tp19687785
p19687785.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



RE: How to instruct Job Tracker to use certain hosts only

2008-04-23 Thread Htin Hlaing
Thanks Owen for the suggestion.  I wonder if there would be side effects
from failing the job from the node consistently? Would job tracker black
list the nodes for other jobs as well?

Htin

-Original Message-
From: Owen O'Malley [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 21, 2008 10:53 PM
To: core-user@hadoop.apache.org
Subject: Re: How to instruct Job Tracker to use certain hosts only


On Apr 18, 2008, at 1:52 PM, Htin Hlaing wrote:

 I would like to run the first job to run on all the compute hosts  
 in the
 cluster (which is by default) and then, I would like to run the  
 second job
 with only on  a subset of the hosts (due to some licensing issue).

One option would be to set mapred.map.max.attempts and  
mapred.reduce.max.attempts to larger numbers and have the map or  
reduce fail if it is run on a bad node. When the task re-runs, it  
will run on a different node. Eventually it will find a valid node.

-- Owen



How to instruct Job Tracker to use certain hosts only

2008-04-18 Thread Htin Hlaing
Hi - I have a situation that I cannot seem to get good answer myself.   I
am using 0.1.6.2.

 

Basically, I have two jobs that I run in order from the same java driver
process.  

 

I would like to run the first job to run on all the compute hosts in the
cluster (which is by default) and then, I would like to run the second job
with only on  a subset of the hosts (due to some licensing issue).  

 

First, I thought that I might be able to do this by giving a different
hadoop-site.xml containing the mapred.hosts specification when I run the
second job.  That does not work because it seems like mapred.hosts is read
in when the jobtracker is started.

 

What are my options?

1)  Change the cluster to only the subset.  Not a good option for me.

2)  Between the jobs, restart the hadoop jobtracker with different
hadoop-site.xml.  I would like to avoid it.

3)  Possibly hadoop on demand but that probably means the job chain
cannot be in the same java process.  Plus, I cannot upgrade to HOD easily
yet.

4)  I would really appreciate if someone can provide a better
alternative that will allow my objective.

 

Thanks,

Htin