Re: Hadoop startup problems ( FileSystem is not ready yet! )

2012-06-16 Thread prasenjit mukherjee
Changing /etc/hosts line from :
127.0.0.1   localhost, prasen-host
to
127.0.0.1   localhost

fixed the problem...

On Sat, Jun 16, 2012 at 12:30 PM, prasenjit mukherjee
prasen@gmail.com wrote:
 I started hadoop in a single-node/pseudo-distributed  mode. Took all
 the precautionary measures like : dfsck, namenode -format etc. before
 running start-all.sh. After starting jobtracker-log keeps getting
 flooded with following stacktraces :

 I have a hunch it is related to localhost/127.0.0.1 stuff.  Any
 pointers on how to fix this. Because of this I cant put anything into
 hdfs.

 $tail -f hadoop-prasen-jobtracker-oilreadproud-lm.log

 2012-06-16 12:09:36,037 WARN org.apache.hadoop.mapred.JobTracker: Retrying...
 2012-06-16 12:09:36,049 WARN org.apache.hadoop.hdfs.DFSClient:
 DataStreamer Exception: java.lang.NumberFormatException: For input
 string: 0:0:0:0:0:0:1%0:50010
        at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
        at java.lang.Integer.parseInt(Integer.java:458)
        at java.lang.Integer.parseInt(Integer.java:499)
        at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:148)
        at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:125)
        at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3025)
        at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983)
        at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
        at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)

 2012-06-16 12:09:36,049 WARN org.apache.hadoop.hdfs.DFSClient: Error
 Recovery for block blk_-5253437002798877541_1048 bad datanode[0] nodes
 == null
 2012-06-16 12:09:36,050 WARN org.apache.hadoop.hdfs.DFSClient: Could
 not get block locations. Source file
 /tmp/hadoop-prasen/mapred/system/jobtracker.info - Aborting...
 2012-06-16 12:09:36,050 WARN org.apache.hadoop.mapred.JobTracker:
 Writing to file
 hdfs://localhost:9000/tmp/hadoop-prasen/mapred/system/jobtracker.info
 failed!
 2012-06-16 12:09:36,050 WARN org.apache.hadoop.mapred.JobTracker:
 FileSystem is not ready yet!
 2012-06-16 12:09:36,052 WARN org.apache.hadoop.mapred.JobTracker:
 Failed to initialize recovery manager.
 java.io.IOException: Could not get block locations. Source file
 /tmp/hadoop-prasen/mapred/system/jobtracker.info - Aborting...
        at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2691)
        at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2255)
        at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2423)


Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Ondřej Klimpera

Hello,

I have very small input size (kB), but processing to produce some output 
takes several minutes. Is there a way how to say, file has 100 lines, i 
need 10 mappers, where each mapper node has to process 10 lines of input 
file?


Thanks for advice.
Ondrej Klimpera


Re: Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Bejoy KS
Hi Ondrej

You can use NLineInputFormat with n set to 10.

--Original Message--
From: Ondřej Klimpera
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Setting number of mappers according to number of TextInput lines
Sent: Jun 16, 2012 14:31

Hello,

I have very small input size (kB), but processing to produce some output 
takes several minutes. Is there a way how to say, file has 100 lines, i 
need 10 mappers, where each mapper node has to process 10 lines of input 
file?

Thanks for advice.
Ondrej Klimpera


Regards
Bejoy KS

Sent from handheld, please excuse typos.

Re: Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Ondřej Klimpera
I tried this approach, but the job is not distributed among 10 mapper 
nodes. Seems Hadoop ignores this property :(


My first thought is, that the small file size is the problem and Hadoop 
doesn't care about it's splitting in proper way.


Thanks any ideas.


On 06/16/2012 11:27 AM, Bejoy KS wrote:

Hi Ondrej

You can use NLineInputFormat with n set to 10.

--Original Message--
From: Ondřej Klimpera
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Setting number of mappers according to number of TextInput lines
Sent: Jun 16, 2012 14:31

Hello,

I have very small input size (kB), but processing to produce some output
takes several minutes. Is there a way how to say, file has 100 lines, i
need 10 mappers, where each mapper node has to process 10 lines of input
file?

Thanks for advice.
Ondrej Klimpera


Regards
Bejoy KS

Sent from handheld, please excuse typos.





Re: Map works well, but Redue failed

2012-06-16 Thread Abhishek
Hi raj,

I think you should increase the reducer worker threads,to fetch the map output.

Regards 
Abhishek.

Sent from my iPhone

On Jun 15, 2012, at 9:42 PM, Raj Vishwanathan rajv...@yahoo.com wrote:

 Most probably you have a network problem. Check your hostname and IP address 
 mapping
 
 
 
 
 From: Yongwei Xing jdxyw2...@gmail.com
 To: common-user@hadoop.apache.org 
 Sent: Thursday, June 14, 2012 10:15 AM
 Subject: Map works well, but Redue failed
 
 Hi all
 
 I run a simple sort program, however, I meet such error like below.
 
 12/06/15 01:13:17 WARN mapred.JobClient: Error reading task outputServer
 returned HTTP response code: 403 for URL:
 http://192.168.1.106:50060/tasklog?plaintext=trueattemptid=attempt_201206150102_0002_m_01_1filter=stdout
 12/06/15 01:13:18 WARN mapred.JobClient: Error reading task outputServer
 returned HTTP response code: 403 for URL:
 http://192.168.1.106:50060/tasklog?plaintext=trueattemptid=attempt_201206150102_0002_m_01_1filter=stderr
 12/06/15 01:13:20 INFO mapred.JobClient:  map 50% reduce 0%
 12/06/15 01:13:23 INFO mapred.JobClient:  map 100% reduce 0%
 12/06/15 01:14:19 INFO mapred.JobClient: Task Id :
 attempt_201206150102_0002_m_00_2, Status : FAILED
 Too many fetch-failures
 12/06/15 01:14:20 WARN mapred.JobClient: Error reading task outputServer
 returned HTTP response code: 403 for URL:
 http://192.168.1.106:50060/tasklog?plaintext=trueattemptid=attempt_201206150102_0002_m_00_2filter=stdout
 
 Does anyone know what's the reason and how to resolve it?
 
 Best Regards,
 
 -- 
 Welcome to my ET Blog http://www.jdxyw.com
 
 


Re: Map works well, but Redue failed

2012-06-16 Thread Yongwei Xing
I did the following steps:

1.stop-all.sh
2.Delete the tmp folder
3.format namenode
4.start-all.sh

The problem has gone. Not sure what's the root cause.

Best Regards,

2012/6/16 Abhishek abhishek.dod...@gmail.com

 Hi raj,

 I think you should increase the reducer worker threads,to fetch the map
 output.

 Regards
 Abhishek.

 Sent from my iPhone

 On Jun 15, 2012, at 9:42 PM, Raj Vishwanathan rajv...@yahoo.com wrote:

  Most probably you have a network problem. Check your hostname and IP
 address mapping
 
 
 
  
  From: Yongwei Xing jdxyw2...@gmail.com
  To: common-user@hadoop.apache.org
  Sent: Thursday, June 14, 2012 10:15 AM
  Subject: Map works well, but Redue failed
 
  Hi all
 
  I run a simple sort program, however, I meet such error like below.
 
  12/06/15 01:13:17 WARN mapred.JobClient: Error reading task outputServer
  returned HTTP response code: 403 for URL:
 
 http://192.168.1.106:50060/tasklog?plaintext=trueattemptid=attempt_201206150102_0002_m_01_1filter=stdout
  12/06/15 01:13:18 WARN mapred.JobClient: Error reading task outputServer
  returned HTTP response code: 403 for URL:
 
 http://192.168.1.106:50060/tasklog?plaintext=trueattemptid=attempt_201206150102_0002_m_01_1filter=stderr
  12/06/15 01:13:20 INFO mapred.JobClient:  map 50% reduce 0%
  12/06/15 01:13:23 INFO mapred.JobClient:  map 100% reduce 0%
  12/06/15 01:14:19 INFO mapred.JobClient: Task Id :
  attempt_201206150102_0002_m_00_2, Status : FAILED
  Too many fetch-failures
  12/06/15 01:14:20 WARN mapred.JobClient: Error reading task outputServer
  returned HTTP response code: 403 for URL:
 
 http://192.168.1.106:50060/tasklog?plaintext=trueattemptid=attempt_201206150102_0002_m_00_2filter=stdout
 
  Does anyone know what's the reason and how to resolve it?
 
  Best Regards,
 
  --
  Welcome to my ET Blog http://www.jdxyw.com
 
 




-- 
Welcome to my ET Blog http://www.jdxyw.com


Re: Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Edward Capriolo
No. The number of lines is not known at planning time. All you know is
the size of the blocks. You want to look at mapred.max.split.size .

On Sat, Jun 16, 2012 at 5:31 AM, Ondřej Klimpera klimp...@fit.cvut.cz wrote:
 I tried this approach, but the job is not distributed among 10 mapper nodes.
 Seems Hadoop ignores this property :(

 My first thought is, that the small file size is the problem and Hadoop
 doesn't care about it's splitting in proper way.

 Thanks any ideas.


 On 06/16/2012 11:27 AM, Bejoy KS wrote:

 Hi Ondrej

 You can use NLineInputFormat with n set to 10.

 --Original Message--
 From: Ondřej Klimpera
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: Setting number of mappers according to number of TextInput lines
 Sent: Jun 16, 2012 14:31

 Hello,

 I have very small input size (kB), but processing to produce some output
 takes several minutes. Is there a way how to say, file has 100 lines, i
 need 10 mappers, where each mapper node has to process 10 lines of input
 file?

 Thanks for advice.
 Ondrej Klimpera


 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.




Re: Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Shi Yu
How did you try it?  I had no problem with NLineInputFormat. It 
just works exactly as expected. 

Shi


Re: Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Harsh J
Ondřej,

While NLineInputFormat will indeed give you N lines per task, it does
not guarantee that the N map tasks that come out for a file from it
will all be sent to different nodes. Which one is your need exactly -
Simply having N lines per map task, or N wider distributed maps?

On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera klimp...@fit.cvut.cz wrote:
 I tried this approach, but the job is not distributed among 10 mapper nodes.
 Seems Hadoop ignores this property :(

 My first thought is, that the small file size is the problem and Hadoop
 doesn't care about it's splitting in proper way.

 Thanks any ideas.



 On 06/16/2012 11:27 AM, Bejoy KS wrote:

 Hi Ondrej

 You can use NLineInputFormat with n set to 10.

 --Original Message--
 From: Ondřej Klimpera
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: Setting number of mappers according to number of TextInput lines
 Sent: Jun 16, 2012 14:31

 Hello,

 I have very small input size (kB), but processing to produce some output
 takes several minutes. Is there a way how to say, file has 100 lines, i
 need 10 mappers, where each mapper node has to process 10 lines of input
 file?

 Thanks for advice.
 Ondrej Klimpera


 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.





-- 
Harsh J