date:20130305

Hi

java.lang.IllegalStateException: Socket 
Socket[addr=/10.86.203.112,port=1004,localport=35170] does not have a channel
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at 
org.apache.hadoop.net.SocketInputWrapper.getReadableByteChannel(SocketInputWrapper.java:83)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:82)
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:832)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:444)

While accessing the HDFS  I keep getting the above mentioned error.
Setting the dfs.client.use.legacy.blockreader to true fixes the problem.
I would like to know what exactly is the problem? Is it a problem/bug in hadoop 
?
Is there is JIRA ticket for this?? 


Cheers,
Subroto Sanyal

Re: Need help optimizing reducer

2013-03-05 Thread Mahesh Balija

The reason why the reducer is fast upto 66% is be because of the Sorting
and Shuffling phase of the reduce and when the actual task is NOT yet
started.

The reduce side is divided into 3 phases of 33~% each - shuffle (fetch
data), sort and finally user-code (reduce). That is why your reduce might
be faster upto 66%. In order to speed up your program you may either have
to have more number of reducers or make your reducer code as optimized as
possible.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath austi...@gmail.com wrote:

 Hi all,

 I have 1 reducer and I have around 600 thousand unique keys coming to it.
 The total data is only around 30 mb.
 My logic doesn't allow me to have more than 1 reducer.
 It's taking too long to complete, around 2 hours. (till 66% it's fast then
 it slows down/ I don't really think it has started doing anything till 66%
 but then why does it show like that?).
 Are there any job execution parameters that can help improve reducer
 performace?
 Any suggestions to improve things when we have to live with just one
 reducer?

 thanks,
 Austin

Re: Need help optimizing reducer

2013-03-05 Thread Fatih Haltas

Hi Austin,

I am not sure whether you had  this kind of mistake or not but in any case,
I would like to state:
that you might be trying to read whole input values,(corresponding key
values) to reducer function from beginning to end(which is the output value
of mapper) while merging them into one output.

If you can send reducer code, you may get more useful replies.




On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija balijamahesh@gmail.comwrote:

 The reason why the reducer is fast upto 66% is be because of the Sorting
 and Shuffling phase of the reduce and when the actual task is NOT yet
 started.

 The reduce side is divided into 3 phases of 33~% each - shuffle (fetch
 data), sort and finally user-code (reduce). That is why your reduce might
 be faster upto 66%. In order to speed up your program you may either have
 to have more number of reducers or make your reducer code as optimized as
 possible.

 Best,
 Mahesh Balija,
 Calsoft Labs.


 On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath austi...@gmail.comwrote:

 Hi all,

 I have 1 reducer and I have around 600 thousand unique keys coming to it.
 The total data is only around 30 mb.
 My logic doesn't allow me to have more than 1 reducer.
 It's taking too long to complete, around 2 hours. (till 66% it's fast
 then it slows down/ I don't really think it has started doing anything till
 66% but then why does it show like that?).
 Are there any job execution parameters that can help improve reducer
 performace?
 Any suggestions to improve things when we have to live with just one
 reducer?

 thanks,
 Austin

Re: Need help optimizing reducer

2013-03-05 Thread Fatih Haltas

I mean,
while trying to add newcoming reducer input value to already merged input
values,to construct whole input values of corresponding key value to
reducer, you might be reading every input values(which are output value of
mapper) from beginning to end.


On Tue, Mar 5, 2013 at 1:46 PM, Fatih Haltas fatih.hal...@nyu.edu wrote:

 Hi Austin,

 I am not sure whether you had  this kind of mistake or not but in any
 case, I would like to state:
 that you might be trying to read whole input values,(corresponding key
 values) to reducer function from beginning to end(which is the output value
 of mapper) while merging them into one output.

 If you can send reducer code, you may get more useful replies.




 On Tue, Mar 5, 2013 at 1:00 PM, Mahesh Balija 
 balijamahesh@gmail.comwrote:

 The reason why the reducer is fast upto 66% is be because of the Sorting
 and Shuffling phase of the reduce and when the actual task is NOT yet
 started.

 The reduce side is divided into 3 phases of 33~% each - shuffle (fetch
 data), sort and finally user-code (reduce). That is why your reduce
 might be faster upto 66%. In order to speed up your program you may either
 have to have more number of reducers or make your reducer code as optimized
 as possible.

 Best,
 Mahesh Balija,
 Calsoft Labs.


 On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath austi...@gmail.comwrote:

 Hi all,

 I have 1 reducer and I have around 600 thousand unique keys coming to
 it. The total data is only around 30 mb.
 My logic doesn't allow me to have more than 1 reducer.
 It's taking too long to complete, around 2 hours. (till 66% it's fast
 then it slows down/ I don't really think it has started doing anything till
 66% but then why does it show like that?).
 Are there any job execution parameters that can help improve reducer
 performace?
 Any suggestions to improve things when we have to live with just one
 reducer?

 thanks,
 Austin

Hadoop cluster setup - could not see second datanode

2013-03-05 Thread AMARNATH, Balachandar

Thanks for the information,

Now I am trying to install hadoop dfs using 2 nodes. A namenode cum datanode, 
and a separate data node. I use the following configuration for my hdfs-site.xml

configuration

  property
namefs.default.name/name
valuelocalhost:9000/value
  /property

  property
namedfs.data.dir/name
value/home/bala/data/value
  /property

  property
namedfs.name.dir/name
value/home/bala/name/value
  /property
/configuration


In namenode, I have added the datanode hostnames (machine1 and machine2).
When I do 'start-all.sh', I see the log that the data node is starting in both 
the machines but I went to the browser in the namenode, I see only one live 
node. (That is the namenode which is configured as datanode)

Any hint here will help me


With regards
Bala





From: Mahesh Balija [mailto:balijamahesh@gmail.com]
Sent: 05 March 2013 14:15
To: user@hadoop.apache.org
Subject: Re: Hadoop file system

You can be able to use Hdfs alone in the distributed mode to fulfill your 
requirement.
Hdfs has the Filesystem java api through which you can interact with the HDFS 
from your client.
HDFS is good if you have less number of files with huge size rather than you 
having many files with small size.

Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Mar 5, 2013 at 10:43 AM, AMARNATH, Balachandar 
balachandar.amarn...@airbus.commailto:balachandar.amarn...@airbus.com wrote:

Hi,

I am new to hdfs. In my java application, I need to perform 'similar operation' 
over large number of files. I would like to store those files in distributed 
machines. I don't think, I will need map reduce paradigm. But however I would 
like to use HDFS for file storage and access. Is it possible (or nice idea) to 
use HDFS as a stand alone stuff? And, java APIs are available to work with HDFS 
so that I can read/write in distributed environment ? Any thoughts here will be 
helpful.


With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.


The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.

basic question about rack awareness and computation migration

2013-03-05 Thread Julian Bui

Hi hadoop users,

I'm trying to find out if computation migration is something the developer
needs to worry about or if it's supposed to be hidden.

I would like to use hadoop to take in a list of image paths in the hdfs and
then have each task compress these large, raw images into something much
smaller - say jpeg  files.

Input: list of paths
Output: compressed jpeg

Since I don't really need a reduce task (I'm more using hadoop for its
reliability and orchestration aspects), my mapper ought to just take the
list of image paths and then work on them.  As I understand it, each image
will likely be on multiple data nodes.

My question is how will each mapper task migrate the computation to the
data nodes?  I recall reading that the namenode is supposed to deal with
this.  Is it hidden from the developer?  Or as the developer, do I need to
discover where the data lies and then migrate the task to that node?  Since
my input is just a list of paths, it seems like the namenode couldn't
really do this for me.

Another question: Where can I find out more about this?  I've looked up
rack awareness and computation migration but haven't really found much
code relating to either one - leading me to believe I'm not supposed to
have to write code to deal with this.

Anyway, could someone please help me out or set me straight on this?

Thanks,
-Julian

RE: Hadoop cluster setup - could not see second datanode

2013-03-05 Thread AMARNATH, Balachandar

I fixed it the below issue :)

Regards
Bala

From: AMARNATH, Balachandar [mailto:balachandar.amarn...@airbus.com]
Sent: 05 March 2013 17:05
To: user@hadoop.apache.org
Subject: Hadoop cluster setup - could not see second datanode

Thanks for the information,

Now I am trying to install hadoop dfs using 2 nodes. A namenode cum datanode, 
and a separate data node. I use the following configuration for my hdfs-site.xml

configuration

  property
namefs.default.name/name
valuelocalhost:9000/value
  /property

  property
namedfs.data.dir/name
value/home/bala/data/value
  /property

  property
namedfs.name.dir/name
value/home/bala/name/value
  /property
/configuration

In namenode, I have added the datanode hostnames (machine1 and machine2).
When I do 'start-all.sh', I see the log that the data node is starting in both 
the machines but I went to the browser in the namenode, I see only one live 
node. (That is the namenode which is configured as datanode)

Any hint here will help me

With regards
Bala

From: Mahesh Balija [mailto:balijamahesh@gmail.com]
Sent: 05 March 2013 14:15
To: user@hadoop.apache.org
Subject: Re: Hadoop file system

You can be able to use Hdfs alone in the distributed mode to fulfill your 
requirement.
Hdfs has the Filesystem java api through which you can interact with the HDFS 
from your client.
HDFS is good if you have less number of files with huge size rather than you 
having many files with small size.

Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Mar 5, 2013 at 10:43 AM, AMARNATH, Balachandar 
balachandar.amarn...@airbus.commailto:balachandar.amarn...@airbus.com wrote:

Hi,

I am new to hdfs. In my java application, I need to perform 'similar operation' 
over large number of files. I would like to store those files in distributed 
machines. I don't think, I will need map reduce paradigm. But however I would 
like to use HDFS for file storage and access. Is it possible (or nice idea) to 
use HDFS as a stand alone stuff? And, java APIs are available to work with HDFS 
so that I can read/write in distributed environment ? Any thoughts here will be 
helpful.

With thanks and regards
Balachandar

The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.

JobTracker client - max connections

2013-03-05 Thread Amit Sela

Hi all,

I'm implementing an API over the JobTracker client - JobClient.
My plan is to have a pool of JobClient objects that will expose the ability
to submit jobs, poll status etc.

My question is: Should I set a maximum pool size ? How many connections
aree too many connection for the JobTracker ? any suggestions for what Pool
to use ?

Thanks,
Amit.

S3N copy creating recursive folders

Hi,

I am using Hadoop 1.0.3 and trying to execute:
hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData /test/srcData

This ends up with:
cp: java.io.IOException: mkdirs: Pathname too long.  Limit 8000 characters, 
1000 levels.

When I try to list the folder recursively /test/srcData: it lists 998 folders 
like:
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 /test/srcData/srcData
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
/test/srcData/srcData/srcData
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
/test/srcData/srcData/srcData/srcData
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
/test/srcData/srcData/srcData/srcData/srcData
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
/test/srcData/srcData/srcData/srcData/srcData/srcData

Is there a problem with s3n filesystem ??

Cheers,
Subroto Sanyal

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re:S3N copy creating recursive folders

Hi Subroto,


I didn't use the s3n filesystem.But  from the output cp: java.io.IOException: 
mkdirs: Pathname too long.  Limit 8000 characters, 1000 levels., I think this 
is because the problem of the path. Is the path longer than 8000 characters or 
the level is more than 1000?
You only have 998 folders.Maybe the last one is more than 8000 characters.Why 
not count the last one's length?


BRs//Julian










-- Original --
From:  Subrotossan...@datameer.com;
Date:  Tue, Mar 5, 2013 10:22 PM
To:  useruser@hadoop.apache.org; 

Subject:  S3N copy creating recursive folders



Hi,

I am using Hadoop 1.0.3 and trying to execute:
hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData /test/srcData

This ends up with:
cp: java.io.IOException: mkdirs: Pathname too long.  Limit 8000 characters, 
1000 levels.

When I try to list the folder recursively /test/srcData: it lists 998 folders 
like:
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 /test/srcData/srcData
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
/test/srcData/srcData/srcData
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
/test/srcData/srcData/srcData/srcData
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
/test/srcData/srcData/srcData/srcData/srcData
drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
/test/srcData/srcData/srcData/srcData/srcData/srcData

Is there a problem with s3n filesystem ??

Cheers,
Subroto Sanyal

Re: S3N copy creating recursive folders

Hi,

Its not because there are too many recursive folders in S3 bucket; in-fact 
there is no recursive folder in the source.
If I list the S3 bucket with Native S3 tools I can find a file srcData with 
size 0 in the folder srcData. 
The copy command keeps on creating folder  /test/srcData/srcData/srcData (keep 
on appending srcData).

Cheers,
Subroto Sanyal

On Mar 5, 2013, at 3:32 PM,  wrote:

 Hi Subroto,
 
 I didn't use the s3n filesystem.But  from the output cp: 
 java.io.IOException: mkdirs: Pathname too long.  Limit 8000 characters, 1000 
 levels., I think this is because the problem of the path. Is the path longer 
 than 8000 characters or the level is more than 1000?
 You only have 998 folders.Maybe the last one is more than 8000 characters.Why 
 not count the last one's length?
 
 BRs//Julian
 
 
 
 
 
 -- Original --
 From:  Subrotossan...@datameer.com;
 Date:  Tue, Mar 5, 2013 10:22 PM
 To:  useruser@hadoop.apache.org;
 Subject:  S3N copy creating recursive folders
 
 Hi,
 
 I am using Hadoop 1.0.3 and trying to execute:
 hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData /test/srcData
 
 This ends up with:
 cp: java.io.IOException: mkdirs: Pathname too long.  Limit 8000 characters, 
 1000 levels.
 
 When I try to list the folder recursively /test/srcData: it lists 998 folders 
 like:
 drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
 /test/srcData/srcData
 drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
 /test/srcData/srcData/srcData
 drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
 /test/srcData/srcData/srcData/srcData
 drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
 /test/srcData/srcData/srcData/srcData/srcData
 drwxr-xr-x   - root supergroup  0 2013-03-05 08:49 
 /test/srcData/srcData/srcData/srcData/srcData/srcData
 
 Is there a problem with s3n filesystem ??
 
 Cheers,
 Subroto Sanyal

Re:RE: Hadoop cluster setup - could not see second datanode

Hello,
Can  Namenode and several datanodes exist in one machine?
I only have one PC. I want to configure it like this way.


BRs//Julian





-- Original --
From:  AMARNATH, Balachandarbalachandar.amarn...@airbus.com;
Date:  Tue, Mar 5, 2013 07:55 PM
To:  user@hadoop.apache.orguser@hadoop.apache.org; 

Subject:  RE: Hadoop cluster setup - could not see second datanode




I fixed it the below issue J

 

 

Regards

Bala

 

From: AMARNATH, Balachandar [mailto:balachandar.amarn...@airbus.com] 
Sent: 05 March 2013 17:05
To: user@hadoop.apache.org
Subject: Hadoop cluster setup - could not see second datanode



 

Thanks for the information,

 

Now I am trying to install hadoop dfs using 2 nodes. A namenode cum datanode, 
and a separate data node. I use the following configuration for my hdfs-site.xml

 

configuration

 

  property

namefs.default.name/name

valuelocalhost:9000/value

  /property

 

  property

namedfs.data.dir/name

value/home/bala/data/value

  /property

 

  property

namedfs.name.dir/name

value/home/bala/name/value

  /property

/configuration

 

 

In namenode, I have added the datanode hostnames (machine1 and machine2).

When I do ??start-all.sh??, I see the log that the data node is starting in 
both the machines but I went to the browser in the namenode, I see only one 
live node. (That is the namenode which is configured as datanode)

 

Any hint here will help me

 

 

With regards

Bala

 

 

 

 

 

From: Mahesh Balija [mailto:balijamahesh@gmail.com] 
Sent: 05 March 2013 14:15
To: user@hadoop.apache.org
Subject: Re: Hadoop file system


 

You can be able to use Hdfs alone in the distributed mode to fulfill your 
requirement.
Hdfs has the Filesystem java api through which you can interact with the HDFS 
from your client.
HDFS is good if you have less number of files with huge size rather than you 
having many files with small size.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Mar 5, 2013 at 10:43 AM, AMARNATH, Balachandar 
balachandar.amarn...@airbus.com wrote:

 


Hi,


 


I am new to hdfs. In my java application, I need to perform ??similar 
operation?? over large number of files. I would like to store those files in 
distributed machines. I don??t think, I will need map reduce paradigm. But 
however I would like to use HDFS for file storage and access. Is it possible 
(or nice idea) to use HDFS as a stand alone stuff? And, java APIs are available 
to work with HDFS so that I can read/write in distributed environment ? Any 
thoughts here will be helpful.


 


 


With thanks and regards


Balachandar


 


 


 

The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.If you are not the intended recipient, please 
notify Airbus immediately and delete this e-mail.Airbus cannot accept any 
responsibility for the accuracy or completeness of this e-mail as it has been 
sent over public networks. If you have any concerns over the content of this 
message or its Accuracy or Integrity, please contact Airbus immediately.All 
outgoing e-mails from Airbus are checked using regularly updated virus scanning 
software but you should take whatever measures you deem to be appropriate to 
ensure that this message and any attachments are virus free.


 
The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.If you are not the intended recipient, please 
notify Airbus immediately and delete this e-mail.Airbus cannot accept any 
responsibility for the accuracy or completeness of this e-mail as it has been 
sent over public networks. If you have any concerns over the content of this 
message or its Accuracy or Integrity, please contact Airbus immediately.All 
outgoing e-mails from Airbus are checked using regularly updated virus scanning 
software but you should take whatever measures you deem to be appropriate to 
ensure that this message and any attachments are virus free.
The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised. If you are not the intended recipient, please 
notify Airbus immediately and delete this e-mail. Airbus cannot accept any 
responsibility for the accuracy or completeness of this e-mail as it has been 
sent over public networks. If you have any concerns over the content of this 
message or its Accuracy or Integrity, please contact Airbus immediately. All 
outgoing e-mails from Airbus are checked using regularly updated virus scanning 
software but you should take whatever measures you deem to be appropriate to 
ensure that this message and any attachments are virus free.

Re:Socket does not have a channel

Hi,
Which revision of hadoop?
and  what's the  situation  to report the Exception?
BRs//Julian


-- Original --
From:  Subrotossan...@datameer.com;
Date:  Tue, Mar 5, 2013 04:46 PM
To:  useruser@hadoop.apache.org; 

Subject:  Socket does not have a channel



Hi


java.lang.IllegalStateException: Socket 
Socket[addr=/10.86.203.112,port=1004,localport=35170] does not have a channel
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at 
org.apache.hadoop.net.SocketInputWrapper.getReadableByteChannel(SocketInputWrapper.java:83)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:82)
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:832)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:444)


While accessing the HDFS  I keep getting the above mentioned error.
Setting the dfs.client.use.legacy.blockreader to true fixes the problem.
I would like to know what exactly is the problem? Is it a problem/bug in hadoop 
?
Is there is JIRA ticket for this?? 




Cheers,
Subroto Sanyal

Re: Socket does not have a channel

Hi Julian,

This is from CDH4.1.2 and I think its based on Apache Hadoop 2.0.

Cheers,
Subroto Sanyal
On Mar 5, 2013, at 3:50 PM,  wrote:

 Hi,
 Which revision of hadoop?
 and  what's the  situation  to report the Exception?
 BRs//Julian
 
 -- Original --
 From:  Subrotossan...@datameer.com;
 Date:  Tue, Mar 5, 2013 04:46 PM
 To:  useruser@hadoop.apache.org;
 Subject:  Socket does not have a channel
 
 Hi
 
 java.lang.IllegalStateException: Socket 
 Socket[addr=/10.86.203.112,port=1004,localport=35170] does not have a channel
   at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:172)
   at 
 org.apache.hadoop.net.SocketInputWrapper.getReadableByteChannel(SocketInputWrapper.java:83)
   at 
 org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
   at 
 org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:82)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:832)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:444)
 
 While accessing the HDFS  I keep getting the above mentioned error.
 Setting the dfs.client.use.legacy.blockreader to true fixes the problem.
 I would like to know what exactly is the problem? Is it a problem/bug in 
 hadoop ?
 Is there is JIRA ticket for this??
 
 
 Cheers,
 Subroto Sanyal

?????? Socket does not have a channel

Yes.It's from hadoop 2.0. I just now read the code 1.1.1.There are no such 
classes the log mentioned.Maybe you can read the code first.




--  --
??: Subrotossan...@datameer.com;
: 2013??3??5??(??) 10:56
??: useruser@hadoop.apache.org; 

: Re: Socket does not have a channel



Hi Julian,

This is from CDH4.1.2 and I think its based on Apache Hadoop 2.0.


Cheers,
Subroto Sanyal
On Mar 5, 2013, at 3:50 PM,  wrote:

Hi,
Which revision of hadoop?
and  what's the  situation  to report the Exception?
BRs//Julian


-- Original --
From:  Subrotossan...@datameer.com;
Date:  Tue, Mar 5, 2013 04:46 PM
To:  useruser@hadoop.apache.org; 

Subject:  Socket does not have a channel



Hi


java.lang.IllegalStateException: Socket 
Socket[addr=/10.86.203.112,port=1004,localport=35170] does not have a channel
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at 
org.apache.hadoop.net.SocketInputWrapper.getReadableByteChannel(SocketInputWrapper.java:83)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:82)
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:832)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:444)


While accessing the HDFS  I keep getting the above mentioned error.
Setting the dfs.client.use.legacy.blockreader to true fixes the problem.
I would like to know what exactly is the problem? Is it a problem/bug in hadoop 
?
Is there is JIRA ticket for this?? 




Cheers,
Subroto Sanyal

Transpose

2013-03-05 Thread Mix Nin

Hi

I have data in a file as follows . There are 3 columns separated by
semicolon(;). Each column would have multiple values separated by comma
(,).

11,22,33;144,244,344;yny;

I need output data in below format. It is like transposing  values of each
column.

11 144 y
22 244 n
33 344 y

Can we write map reduce program to achieve this. Could you help on the code
on how to write.


Thanks

Re: Transpose

2013-03-05 Thread Michel Segel

Yes you can.
You read in the row in each iteration of Mapper.map()
Text input.
You then output 3 times to the collector one for each row of the matrix.

Spin,sort, and reduce as needed.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 5, 2013, at 9:11 AM, Mix Nin pig.mi...@gmail.com wrote:

 Hi
 
 I have data in a file as follows . There are 3 columns separated by 
 semicolon(;). Each column would have multiple values separated by comma (,). 
 
 11,22,33;144,244,344;yny;
 
 I need output data in below format. It is like transposing  values of each 
 column.
 
 11 144 y  
 22 244 n
 33 344 y
 
 Can we write map reduce program to achieve this. Could you help on the code 
 on how to write.
 
 
 Thanks

Re: Hadoop cluster setup - could not see second datanode

2013-03-05 Thread Robert Evans

Why would you need several data nodes?  It is simple to have one data node and 
one name node on the same machine.  I believe that you can make multiple data 
nodes run on the same machine, but it would take quite a bit of configuration 
work to do it, and it would only really be helpful for you to do some very 
specific testing involving multiple data nodes.

--Bobby

From: 卖报的小行家 85469...@qq.commailto:85469...@qq.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Tuesday, March 5, 2013 8:41 AM
To: user user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re:RE: Hadoop cluster setup - could not see second datanode

Hello,
Can  Namenode and several datanodes exist in one machine?
I only have one PC. I want to configure it like this way.

BRs//Julian


-- Original --
From:  AMARNATH, 
Balachandarbalachandar.amarn...@airbus.commailto:balachandar.amarn...@airbus.com;
Date:  Tue, Mar 5, 2013 07:55 PM
To:  
user@hadoop.apache.orgmailto:user@hadoop.apache.orguser@hadoop.apache.orgmailto:user@hadoop.apache.org;
Subject:  RE: Hadoop cluster setup - could not see second datanode

I fixed it the below issue :)


Regards
Bala

From: AMARNATH, Balachandar [mailto:balachandar.amarn...@airbus.com]
Sent: 05 March 2013 17:05
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Hadoop cluster setup - could not see second datanode

Thanks for the information,

Now I am trying to install hadoop dfs using 2 nodes. A namenode cum datanode, 
and a separate data node. I use the following configuration for my hdfs-site.xml

configuration

  property
namefs.default.name/name
valuelocalhost:9000/value
  /property

  property
namedfs.data.dir/name
value/home/bala/data/value
  /property

  property
namedfs.name.dir/name
value/home/bala/name/value
  /property
/configuration


In namenode, I have added the datanode hostnames (machine1 and machine2).
When I do ‘start-all.sh’, I see the log that the data node is starting in both 
the machines but I went to the browser in the namenode, I see only one live 
node. (That is the namenode which is configured as datanode)

Any hint here will help me


With regards
Bala





From: Mahesh Balija [mailto:balijamahesh@gmail.com]
Sent: 05 March 2013 14:15
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: Hadoop file system

You can be able to use Hdfs alone in the distributed mode to fulfill your 
requirement.
Hdfs has the Filesystem java api through which you can interact with the HDFS 
from your client.
HDFS is good if you have less number of files with huge size rather than you 
having many files with small size.

Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Mar 5, 2013 at 10:43 AM, AMARNATH, Balachandar 
balachandar.amarn...@airbus.commailto:balachandar.amarn...@airbus.com wrote:

Hi,

I am new to hdfs. In my java application, I need to perform ‘similar operation’ 
over large number of files. I would like to store those files in distributed 
machines. I don’t think, I will need map reduce paradigm. But however I would 
like to use HDFS for file storage and access. Is it possible (or nice idea) to 
use HDFS as a stand alone stuff? And, java APIs are available to work with HDFS 
so that I can read/write in distributed environment ? Any thoughts here will be 
helpful.


With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.


The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.

The information in this

Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row. To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
conf.set(textinputformat.record.delimiter, ;). Your mapper will
receive keys as LongWritables specifying the byte index into the input
file, and Text as values. The mapper will tokenize the input string.

Emiting a map output for each data point in each column, you can then use
secondary sort to send the data to the right place in the right order (see
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/).
Your composite key would look like (index of data point in column, which is
the row index; the LongWritable passed in as the map input key). Each
reduce call would get all the points in a single row. You would sort/group
by row index, and within a reduce's values, sort by byte index so that
entries from earlier columns come before later ones.

Does that make sense?

Sandy

On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin pig.mi...@gmail.com wrote:

I have data in a file as follows . There are 3 columns separated by
semicolon(;). Each column would have multiple values separated by comma
(,).

11,22,33;144,244,344;yny;

I need output data in below format. It is like transposing values of each
column.

11 144 y
22 244 n
33 344 y

Can we write map reduce program to achieve this. Could you help on the
code on how to write.

Thanks

Re: 回复： Socket does not have a channel

2013-03-05 Thread shashwat shriparv

Try setting dfs.client.use.legacy.blockreader to true



∞
Shashwat Shriparv



On Tue, Mar 5, 2013 at 8:39 PM, 卖报的小行家 85469...@qq.com wrote:

 Yes.It's from hadoop 2.0. I just now read the code 1.1.1.There are no such
 classes the log mentioned.Maybe you can read the code first.


 -- 原始邮件 --
 *发件人:* Subrotossan...@datameer.com;
 *发送时间:* 2013年3月5日(星期二) 晚上10:56
 *收件人:* useruser@hadoop.apache.org; **
 *主题:* Re: Socket does not have a channel

 Hi Julian,

 This is from CDH4.1.2 and I think its based on Apache Hadoop 2.0.

 Cheers,
 Subroto Sanyal
 On Mar 5, 2013, at 3:50 PM, 卖报的小行家 wrote:

 Hi,
 Which revision of hadoop?
 and  what's the  situation  to report the Exception?
 BRs//Julian

 -- Original --
 *From: * Subrotossan...@datameer.com;
 *Date: * Tue, Mar 5, 2013 04:46 PM
 *To: * useruser@hadoop.apache.org; **
 *Subject: * Socket does not have a channel

 Hi

 java.lang.IllegalStateException: Socket 
 Socket[addr=/10.86.203.112,port=1004,localport=35170]
 does not have a channel
  at
 com.google.common.base.Preconditions.checkState(Preconditions.java:172)
  at
 org.apache.hadoop.net.SocketInputWrapper.getReadableByteChannel(SocketInputWrapper.java:83)
  at
 org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
  at
 org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:82)
  at
 org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:832)
  at
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:444)

 While accessing the HDFS  I keep getting the above mentioned error.
 Setting the dfs.client.use.legacy.blockreader to true fixes the problem.
 I would like to know what exactly is the problem? Is it a problem/bug in
 hadoop ?
 Is there is JIRA ticket for this??


 Cheers,
 Subroto Sanyal

How to setup Cloudera Hadoop to run everything on a localhost?

I am trying to run all Hadoop servers on a single Ubuntu localhost. All
ports are open and my /etc/hosts file is

127.0.0.1   frigate frigate.domain.locallocalhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

When trying to install cluster Cloudera manager fails with the following
messages:

Installation failed. Failed to receive heartbeat from agent.

I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to my
provider. What configuration is missing?

Thanks!

Re: How to setup Cloudera Hadoop to run everything on a localhost?

2013-03-05 Thread Jean-Marc Spaggiari

Hi Anton,

Can you try to add something like:
your.local.ip.addressyourhostname

into your hosts file?

Like:
192.168.1.2  masterserver

2013/3/5 anton ashanin anton.asha...@gmail.com:
 I am trying to run all Hadoop servers on a single Ubuntu localhost. All
 ports are open and my /etc/hosts file is

 127.0.0.1   frigate frigate.domain.locallocalhost
 # The following lines are desirable for IPv6 capable hosts
 ::1 ip6-localhost ip6-loopback
 fe00::0 ip6-localnet
 ff00::0 ip6-mcastprefix
 ff02::1 ip6-allnodes
 ff02::2 ip6-allrouters

 When trying to install cluster Cloudera manager fails with the following
 messages:

 Installation failed. Failed to receive heartbeat from agent.

 I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to my
 provider. What configuration is missing?

 Thanks!

Re: How to setup Cloudera Hadoop to run everything on a localhost?

Jean, thanks for trying to help.
I get my IP address by DHCP. Every time I start my Ubuntu I possibly can
get a different IP address from my WiFi modem /router.
Will it be ok to add static address from  192.168.*.*  to /etc/hosts in
this case?



On Tue, Mar 5, 2013 at 9:47 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
 wrote:

 Hi Anton,

 Can you try to add something like:
 your.local.ip.addressyourhostname

 into your hosts file?

 Like:
 192.168.1.2  masterserver

 2013/3/5 anton ashanin anton.asha...@gmail.com:
  I am trying to run all Hadoop servers on a single Ubuntu localhost. All
  ports are open and my /etc/hosts file is
 
  127.0.0.1   frigate frigate.domain.locallocalhost
  # The following lines are desirable for IPv6 capable hosts
  ::1 ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
 
  When trying to install cluster Cloudera manager fails with the following
  messages:
 
  Installation failed. Failed to receive heartbeat from agent.
 
  I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to my
  provider. What configuration is missing?
 
  Thanks!

Re: How to setup Cloudera Hadoop to run everything on a localhost?

2013-03-05 Thread Suresh Srinivas

Can you please take this Cloudera mailing list?


On Tue, Mar 5, 2013 at 10:33 AM, anton ashanin anton.asha...@gmail.comwrote:

 I am trying to run all Hadoop servers on a single Ubuntu localhost. All
 ports are open and my /etc/hosts file is

 127.0.0.1   frigate frigate.domain.locallocalhost
 # The following lines are desirable for IPv6 capable hosts
 ::1 ip6-localhost ip6-loopback
 fe00::0 ip6-localnet
 ff00::0 ip6-mcastprefix
 ff02::1 ip6-allnodes
 ff02::2 ip6-allrouters

 When trying to install cluster Cloudera manager fails with the following
 messages:

 Installation failed. Failed to receive heartbeat from agent.

 I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to my
 provider. What configuration is missing?

 Thanks!




-- 
http://hortonworks.com/download/

Re: 回复： Socket does not have a channel

Hi Shashwat,

As already mentioned already in my mail setting 
dfs.client.use.legacy.blockreader to true fixes the problem.
This looks to be workaround or moreover disabling a feature.
Would like to know, what is the exact problem?

Cheers,
Subroto Sanyal
On Mar 5, 2013, at 6:33 PM, shashwat shriparv wrote:

 Try setting dfs.client.use.legacy.blockreader to true
 
 ∞
 Shashwat Shriparv
 
 
 
 On Tue, Mar 5, 2013 at 8:39 PM, 卖报的小行家 85469...@qq.com wrote:
 Yes.It's from hadoop 2.0. I just now read the code 1.1.1.There are no such 
 classes the log mentioned.Maybe you can read the code first.
 
 
 -- 原始邮件 --
 发件人: Subrotossan...@datameer.com;
 发送时间: 2013年3月5日(星期二) 晚上10:56
 收件人: useruser@hadoop.apache.org;
 主题: Re: Socket does not have a channel
 
 Hi Julian,
 
 This is from CDH4.1.2 and I think its based on Apache Hadoop 2.0.
 
 Cheers,
 Subroto Sanyal
 On Mar 5, 2013, at 3:50 PM, 卖报的小行家 wrote:
 
 Hi,
 Which revision of hadoop?
 and  what's the  situation  to report the Exception?
 BRs//Julian
 
 -- Original --
 From:  Subrotossan...@datameer.com;
 Date:  Tue, Mar 5, 2013 04:46 PM
 To:  useruser@hadoop.apache.org;
 Subject:  Socket does not have a channel
 
 Hi
 
 java.lang.IllegalStateException: Socket 
 Socket[addr=/10.86.203.112,port=1004,localport=35170] does not have a channel
  at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:172)
  at 
 org.apache.hadoop.net.SocketInputWrapper.getReadableByteChannel(SocketInputWrapper.java:83)
  at 
 org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
  at 
 org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:82)
  at 
 org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:832)
  at 
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:444)
 
 While accessing the HDFS  I keep getting the above mentioned error.
 Setting the dfs.client.use.legacy.blockreader to true fixes the problem.
 I would like to know what exactly is the problem? Is it a problem/bug in 
 hadoop ?
 Is there is JIRA ticket for this??
 
 
 Cheers,
 Subroto Sanyal

Re: How to setup Cloudera Hadoop to run everything on a localhost?

2013-03-05 Thread Jean-Marc Spaggiari

Moving to cdh-user, user@hadoop in BCC

Anton, can you just with the IP you have and see if it fixed the issue
before trying anything else?

JM

2013/3/5 anton ashanin anton.asha...@gmail.com:
 Jean, thanks for trying to help.
 I get my IP address by DHCP. Every time I start my Ubuntu I possibly can get
 a different IP address from my WiFi modem /router.
 Will it be ok to add static address from  192.168.*.*  to /etc/hosts in this
 case?



 On Tue, Mar 5, 2013 at 9:47 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org wrote:

 Hi Anton,

 Can you try to add something like:
 your.local.ip.addressyourhostname

 into your hosts file?

 Like:
 192.168.1.2  masterserver

 2013/3/5 anton ashanin anton.asha...@gmail.com:
  I am trying to run all Hadoop servers on a single Ubuntu localhost. All
  ports are open and my /etc/hosts file is
 
  127.0.0.1   frigate frigate.domain.locallocalhost
  # The following lines are desirable for IPv6 capable hosts
  ::1 ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
 
  When trying to install cluster Cloudera manager fails with the following
  messages:
 
  Installation failed. Failed to receive heartbeat from agent.
 
  I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to
  my
  provider. What configuration is missing?
 
  Thanks!

Re: How to setup Cloudera Hadoop to run everything on a localhost?

2013-03-05 Thread Morgan Reece

Don't use 'localhost' as your host name.  For example, if you wanted to use
the name 'node'; add another line to your hosts file like:

127.0.1.1 node.domain.local node

Then change all the host references in your configuration files to 'node'
-- also, don't forget to change the master/slave files as well.

Now, if you decide to use an external address it would need to be static.
This is easy to do, just follow this guide
http://www.howtoforge.com/linux-basics-set-a-static-ip-on-ubuntu
and replace '127.0.1.1' with whatever external address you decide on.

On Tue, Mar 5, 2013 at 12:59 PM, Suresh Srinivas sur...@hortonworks.comwrote:

 Can you please take this Cloudera mailing list?


 On Tue, Mar 5, 2013 at 10:33 AM, anton ashanin anton.asha...@gmail.comwrote:

 I am trying to run all Hadoop servers on a single Ubuntu localhost. All
 ports are open and my /etc/hosts file is

 127.0.0.1   frigate frigate.domain.locallocalhost
 # The following lines are desirable for IPv6 capable hosts
 ::1 ip6-localhost ip6-loopback
 fe00::0 ip6-localnet
 ff00::0 ip6-mcastprefix
 ff02::1 ip6-allnodes
 ff02::2 ip6-allrouters

 When trying to install cluster Cloudera manager fails with the following
 messages:

 Installation failed. Failed to receive heartbeat from agent.

 I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to my
 provider. What configuration is missing?

 Thanks!




 --
 http://hortonworks.com/download/

Re: basic question about rack awareness and computation migration

2013-03-05 Thread Rohit Kochar

Hello ,
To be precise this is hidden from the developer and you need not write any code 
for this.
Whenever any file is stored in HDFS than it is splitted into block size of 
configured size and each block could potentially be stored on different 
datanode.All this information of which file contains which blocks resides with 
the namenode.

So essentially whenever a file is accessed via DFS Client it requests the  
NameNode for metadata,
which DFS client uses to provide the file in streaming fashion to enduser.

Since namenode knows the location of all the blocks/files ,a task can be 
scheduled by hadoop to be executed on the same node which is having data.

Thanks
Rohit Kochar

On 05-Mar-2013, at 5:19 PM, Julian Bui wrote:

 Hi hadoop users,
 
 I'm trying to find out if computation migration is something the developer 
 needs to worry about or if it's supposed to be hidden.
 
 I would like to use hadoop to take in a list of image paths in the hdfs and 
 then have each task compress these large, raw images into something much 
 smaller - say jpeg  files.  
 
 Input: list of paths
 Output: compressed jpeg
 
 Since I don't really need a reduce task (I'm more using hadoop for its 
 reliability and orchestration aspects), my mapper ought to just take the list 
 of image paths and then work on them.  As I understand it, each image will 
 likely be on multiple data nodes.  
 
 My question is how will each mapper task migrate the computation to the 
 data nodes?  I recall reading that the namenode is supposed to deal with 
 this.  Is it hidden from the developer?  Or as the developer, do I need to 
 discover where the data lies and then migrate the task to that node?  Since 
 my input is just a list of paths, it seems like the namenode couldn't really 
 do this for me.
 
 Another question: Where can I find out more about this?  I've looked up rack 
 awareness and computation migration but haven't really found much code 
 relating to either one - leading me to believe I'm not supposed to have to 
 write code to deal with this.
 
 Anyway, could someone please help me out or set me straight on this?
 
 Thanks,
 -Julian

Re: How to setup Cloudera Hadoop to run everything on a localhost?

I am at a loss. I have set an IP address that my node got by DHCP:
 127.0.0.1   localhost
192.168.1.6node

This has not helped. Cloudera Manager finds this host all right, but still
can not get a heartbeat from it next.
Maybe the problem is that at the moment of these experiments I have three
laptops with addresses assigned by DHCP all running at once?

To make Hadoop work I am ready now to switch Ubuntu for CentOS or should I
try something else?
Please let me know on what Linux version you have managed to run Hadoop on
a local host only?


On Tue, Mar 5, 2013 at 10:54 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Anton,

 Here is what my host is looking like:
 127.0.0.1   localhost
 192.168.1.2myserver

 JM

 2013/3/5 anton ashanin anton.asha...@gmail.com:
  Morgan,
  Just did exactly as you suggested, my /etc/hosts:
  127.0.1.1 node.domain.local node
 
  Wiped out, annihilated my previous installation completely and
 reinstalled
  everything from scratch.
  The same problem with CLOUDERA MANAGER (FREE EDITION):
  Installation failed.  Failed to receive heartbeat from agent
  
 
  I will try now the the  bright idea from Jean, looks promising to me
 
 
 
  On Tue, Mar 5, 2013 at 10:10 PM, Morgan Reece winter2...@gmail.com
 wrote:
 
  Don't use 'localhost' as your host name.  For example, if you wanted to
  use the name 'node'; add another line to your hosts file like:
 
  127.0.1.1 node.domain.local node
 
  Then change all the host references in your configuration files to
 'node'
  -- also, don't forget to change the master/slave files as well.
 
  Now, if you decide to use an external address it would need to be
 static.
  This is easy to do, just follow this guide
  http://www.howtoforge.com/linux-basics-set-a-static-ip-on-ubuntu
  and replace '127.0.1.1' with whatever external address you decide on.
 
 
  On Tue, Mar 5, 2013 at 12:59 PM, Suresh Srinivas 
 sur...@hortonworks.com
  wrote:
 
  Can you please take this Cloudera mailing list?
 
 
  On Tue, Mar 5, 2013 at 10:33 AM, anton ashanin 
 anton.asha...@gmail.com
  wrote:
 
  I am trying to run all Hadoop servers on a single Ubuntu localhost.
 All
  ports are open and my /etc/hosts file is
 
  127.0.0.1   frigate frigate.domain.locallocalhost
  # The following lines are desirable for IPv6 capable hosts
  ::1 ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
 
  When trying to install cluster Cloudera manager fails with the
 following
  messages:
 
  Installation failed. Failed to receive heartbeat from agent.
 
  I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem to
  my provider. What configuration is missing?
 
  Thanks!
 
 
 
 
  --
  http://hortonworks.com/download/

Re: How to setup Cloudera Hadoop to run everything on a localhost?

2013-03-05 Thread yibing Shi

Hi Anton,

Cloudera manager needs fully qualified domain name. Run hostname -f to
check whether you have FQDN or not.

I am not familiar with Ubuntu, but on my CentOS, I just put the FQDN into
/etc/sysconfig/network, which then looks like the following:
NETWORKING=yes
HOSTNAME=myhost.my.domain
GATEWAY=10.2.2.254


http://demo.effectivemeasure.com/signatures/au/YibingShi.vcf



On Wed, Mar 6, 2013 at 8:14 AM, anton ashanin anton.asha...@gmail.comwrote:

 I am at a loss. I have set an IP address that my node got by DHCP:
  127.0.0.1   localhost
 192.168.1.6node

 This has not helped. Cloudera Manager finds this host all right, but still
 can not get a heartbeat from it next.
 Maybe the problem is that at the moment of these experiments I have three
 laptops with addresses assigned by DHCP all running at once?

 To make Hadoop work I am ready now to switch Ubuntu for CentOS or should I
 try something else?
 Please let me know on what Linux version you have managed to run Hadoop on
 a local host only?


 On Tue, Mar 5, 2013 at 10:54 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi Anton,

 Here is what my host is looking like:
 127.0.0.1   localhost
 192.168.1.2myserver


 JM

 2013/3/5 anton ashanin anton.asha...@gmail.com:
  Morgan,
  Just did exactly as you suggested, my /etc/hosts:
  127.0.1.1 node.domain.local node
 
  Wiped out, annihilated my previous installation completely and
 reinstalled
  everything from scratch.
  The same problem with CLOUDERA MANAGER (FREE EDITION):
  Installation failed.  Failed to receive heartbeat from agent
  
 
  I will try now the the  bright idea from Jean, looks promising to me
 
 
 
  On Tue, Mar 5, 2013 at 10:10 PM, Morgan Reece winter2...@gmail.com
 wrote:
 
  Don't use 'localhost' as your host name.  For example, if you wanted to
  use the name 'node'; add another line to your hosts file like:
 
  127.0.1.1 node.domain.local node
 
  Then change all the host references in your configuration files to
 'node'
  -- also, don't forget to change the master/slave files as well.
 
  Now, if you decide to use an external address it would need to be
 static.
  This is easy to do, just follow this guide
  http://www.howtoforge.com/linux-basics-set-a-static-ip-on-ubuntu
  and replace '127.0.1.1' with whatever external address you decide on.
 
 
  On Tue, Mar 5, 2013 at 12:59 PM, Suresh Srinivas 
 sur...@hortonworks.com
  wrote:
 
  Can you please take this Cloudera mailing list?
 
 
  On Tue, Mar 5, 2013 at 10:33 AM, anton ashanin 
 anton.asha...@gmail.com
  wrote:
 
  I am trying to run all Hadoop servers on a single Ubuntu localhost.
 All
  ports are open and my /etc/hosts file is
 
  127.0.0.1   frigate frigate.domain.locallocalhost
  # The following lines are desirable for IPv6 capable hosts
  ::1 ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
 
  When trying to install cluster Cloudera manager fails with the
 following
  messages:
 
  Installation failed. Failed to receive heartbeat from agent.
 
  I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem
 to
  my provider. What configuration is missing?
 
  Thanks!
 
 
 
 
  --
  http://hortonworks.com/download/

Re: How to setup Cloudera Hadoop to run everything on a localhost?

Do you run all Hadoop servers on a single host that gets IP by DHCP?
What do you have in /etc/hosts?

Thanks!


On Wed, Mar 6, 2013 at 1:25 AM, yibing Shi
yibing@effectivemeasure.comwrote:

 Hi Anton,

 Cloudera manager needs fully qualified domain name. Run hostname -f to
 check whether you have FQDN or not.

 I am not familiar with Ubuntu, but on my CentOS, I just put the FQDN into
 /etc/sysconfig/network, which then looks like the following:
 NETWORKING=yes
 HOSTNAME=myhost.my.domain
 GATEWAY=10.2.2.254


 http://demo.effectivemeasure.com/signatures/au/YibingShi.vcf



 On Wed, Mar 6, 2013 at 8:14 AM, anton ashanin anton.asha...@gmail.comwrote:

 I am at a loss. I have set an IP address that my node got by DHCP:
  127.0.0.1   localhost
 192.168.1.6node

 This has not helped. Cloudera Manager finds this host all right, but
 still can not get a heartbeat from it next.
 Maybe the problem is that at the moment of these experiments I have three
 laptops with addresses assigned by DHCP all running at once?

 To make Hadoop work I am ready now to switch Ubuntu for CentOS or should
 I try something else?
 Please let me know on what Linux version you have managed to run Hadoop
 on a local host only?


 On Tue, Mar 5, 2013 at 10:54 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi Anton,

 Here is what my host is looking like:
 127.0.0.1   localhost
 192.168.1.2myserver


 JM

 2013/3/5 anton ashanin anton.asha...@gmail.com:
  Morgan,
  Just did exactly as you suggested, my /etc/hosts:
  127.0.1.1 node.domain.local node
 
  Wiped out, annihilated my previous installation completely and
 reinstalled
  everything from scratch.
  The same problem with CLOUDERA MANAGER (FREE EDITION):
  Installation failed.  Failed to receive heartbeat from agent
  
 
  I will try now the the  bright idea from Jean, looks promising to me
 
 
 
  On Tue, Mar 5, 2013 at 10:10 PM, Morgan Reece winter2...@gmail.com
 wrote:
 
  Don't use 'localhost' as your host name.  For example, if you wanted
 to
  use the name 'node'; add another line to your hosts file like:
 
  127.0.1.1 node.domain.local node
 
  Then change all the host references in your configuration files to
 'node'
  -- also, don't forget to change the master/slave files as well.
 
  Now, if you decide to use an external address it would need to be
 static.
  This is easy to do, just follow this guide
  http://www.howtoforge.com/linux-basics-set-a-static-ip-on-ubuntu
  and replace '127.0.1.1' with whatever external address you decide on.
 
 
  On Tue, Mar 5, 2013 at 12:59 PM, Suresh Srinivas 
 sur...@hortonworks.com
  wrote:
 
  Can you please take this Cloudera mailing list?
 
 
  On Tue, Mar 5, 2013 at 10:33 AM, anton ashanin 
 anton.asha...@gmail.com
  wrote:
 
  I am trying to run all Hadoop servers on a single Ubuntu localhost.
 All
  ports are open and my /etc/hosts file is
 
  127.0.0.1   frigate frigate.domain.locallocalhost
  # The following lines are desirable for IPv6 capable hosts
  ::1 ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
 
  When trying to install cluster Cloudera manager fails with the
 following
  messages:
 
  Installation failed. Failed to receive heartbeat from agent.
 
  I run my Ubuntu-12.04 host from home connected by WiFi/dialup modem
 to
  my provider. What configuration is missing?
 
  Thanks!
 
 
 
 
  --
  http://hortonworks.com/download/

Re: How to setup Cloudera Hadoop to run everything on a localhost?

2013-03-05 Thread yibing Shi

I didn't run all the services on a single server, but I doesn't matter
since the installation is the same no matter how many servers you are going
to install on.

I got the same error as you and it turned out that CM needs to be able to
know the FQDN. But I didn't use DHCP so it is easier for me to fix that. I
guess you might have to set up the DHCP server correctly for CM to find
your FQDN.


http://demo.effectivemeasure.com/signatures/au/YibingShi.vcf



On Wed, Mar 6, 2013 at 9:56 AM, anton ashanin anton.asha...@gmail.comwrote:

 Do you run all Hadoop servers on a single host that gets IP by DHCP?
 What do you have in /etc/hosts?

 Thanks!


 On Wed, Mar 6, 2013 at 1:25 AM, yibing Shi 
 yibing@effectivemeasure.com wrote:

 Hi Anton,

 Cloudera manager needs fully qualified domain name. Run hostname -f to
 check whether you have FQDN or not.

 I am not familiar with Ubuntu, but on my CentOS, I just put the FQDN into
 /etc/sysconfig/network, which then looks like the following:
 NETWORKING=yes
 HOSTNAME=myhost.my.domain
 GATEWAY=10.2.2.254


 http://demo.effectivemeasure.com/signatures/au/YibingShi.vcf



 On Wed, Mar 6, 2013 at 8:14 AM, anton ashanin anton.asha...@gmail.comwrote:

 I am at a loss. I have set an IP address that my node got by DHCP:
  127.0.0.1   localhost
 192.168.1.6node

 This has not helped. Cloudera Manager finds this host all right, but
 still can not get a heartbeat from it next.
 Maybe the problem is that at the moment of these experiments I have
 three laptops with addresses assigned by DHCP all running at once?

 To make Hadoop work I am ready now to switch Ubuntu for CentOS or should
 I try something else?
 Please let me know on what Linux version you have managed to run Hadoop
 on a local host only?


 On Tue, Mar 5, 2013 at 10:54 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi Anton,

 Here is what my host is looking like:
 127.0.0.1   localhost
 192.168.1.2myserver


 JM

 2013/3/5 anton ashanin anton.asha...@gmail.com:
  Morgan,
  Just did exactly as you suggested, my /etc/hosts:
  127.0.1.1 node.domain.local node
 
  Wiped out, annihilated my previous installation completely and
 reinstalled
  everything from scratch.
  The same problem with CLOUDERA MANAGER (FREE EDITION):
  Installation failed.  Failed to receive heartbeat from agent
  
 
  I will try now the the  bright idea from Jean, looks promising to me
 
 
 
  On Tue, Mar 5, 2013 at 10:10 PM, Morgan Reece winter2...@gmail.com
 wrote:
 
  Don't use 'localhost' as your host name.  For example, if you wanted
 to
  use the name 'node'; add another line to your hosts file like:
 
  127.0.1.1 node.domain.local node
 
  Then change all the host references in your configuration files to
 'node'
  -- also, don't forget to change the master/slave files as well.
 
  Now, if you decide to use an external address it would need to be
 static.
  This is easy to do, just follow this guide
  http://www.howtoforge.com/linux-basics-set-a-static-ip-on-ubuntu
  and replace '127.0.1.1' with whatever external address you decide on.
 
 
  On Tue, Mar 5, 2013 at 12:59 PM, Suresh Srinivas 
 sur...@hortonworks.com
  wrote:
 
  Can you please take this Cloudera mailing list?
 
 
  On Tue, Mar 5, 2013 at 10:33 AM, anton ashanin 
 anton.asha...@gmail.com
  wrote:
 
  I am trying to run all Hadoop servers on a single Ubuntu
 localhost. All
  ports are open and my /etc/hosts file is
 
  127.0.0.1   frigate frigate.domain.locallocalhost
  # The following lines are desirable for IPv6 capable hosts
  ::1 ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
 
  When trying to install cluster Cloudera manager fails with the
 following
  messages:
 
  Installation failed. Failed to receive heartbeat from agent.
 
  I run my Ubuntu-12.04 host from home connected by WiFi/dialup
 modem to
  my provider. What configuration is missing?
 
  Thanks!
 
 
 
 
  --
  http://hortonworks.com/download/

Re: How to setup Cloudera Hadoop to run everything on a localhost?