Re: Could not reserve enough space for heap in JVM

2009-02-25 Thread souravm
In 32 bit machine u r limited to 4 gb heap size at jvm level per machine

- Original Message -
From: Arijit Mukherjee 
To: core-user@hadoop.apache.org 
Sent: Wed Feb 25 21:27:25 2009
Subject: Re: Could not reserve enough space for heap in JVM

Mine is 32bit. As of now, it has only 2GB RAM, but I'm planning to acquire
more hardware resources - so a clarification in this regard would help me in
deciding the specs of the new cluster.

Arijit

2009/2/26 souravm 

> Is ur machine 32 bit or 64 bit
>
> - Original Message -
> From: Nick Cen 
> To: core-user@hadoop.apache.org 
> Sent: Wed Feb 25 21:10:00 2009
> Subject: Re: Could not reserve enough space for heap in JVM
>
> I got a question relatived to the HADOOP_HEAPSIZE variable. My machine's
> memory size is 16G. but when i set HADOOP_HEAPSIZE to 4GB, it thrown the
> exception refered in this thread. how can i make full use of my mem. thx.
>
> 2009/2/26 Arijit Mukherjee 
>
> > I was getting similar errors too while running the mapreduce samples. I
> > fiddled with the hadoop-env.sh (where the HEAPSIZE is specified) and the
> > hadoop-site.xml files - and rectified it after some trial and error. But
> I
> > would like to know if there is a thumb rule for this. Right now, I've a
> > core
> > duo machine with 2GB RAM running on Ubuntu 8.10, and I've found that a
> > HEAPSIZE of 256Mb works without any problems. Anything more than that
> would
> > give the same error (even when nothing else is going on in the machine).
> >
> > Arijit
> >
> > 2009/2/26 Anum Ali 
> >
> > > If the solution given my Matei Zaharia wont work , which I guess it
> > > wont if you are using eclipse 3.3.0 because this is a bug , which they
> > > resloved it in later version which is eclipse 3.4 ganymede. Better
> > > upgrade eclipse version.
> > >
> > >
> > >
> > >
> > > On 2/26/09, Matei Zaharia  wrote:
> > > > These variables have to be at runtime through a config file, not at
> > > compile
> > > > time. You can set them in hadoop-env.sh: Uncomment the line with
> export
> > > > HADOOP_HEAPSIZE= to set the heap size for all Hadoop
> > processes,
> > > or
> > > > change options for specific commands. Now these commands are for the
> > > Hadoop
> > > > processes themselves, but if you are getting the error in tasks
> you're
> > > > running, you can set these in your hadoop-site.xml through the
> > > > mapred.child.java.opts variable, as follows:
> > > > 
> > > >   mapred.child.java.opts
> > > >   -Xmx512m
> > > > 
> > > >
> > > > By the way I'm not sure if -J-Xmx is the right syntax; I've always
> seen
> > > -Xmx
> > > > and -Xms.
> > > >
> > > > On Wed, Feb 25, 2009 at 5:05 PM, madhuri72 
> wrote:
> > > >
> > > >>
> > > >> Hi,
> > > >>
> > > >> I'm trying to run hadoop version 19 on ubuntu with java build
> > > >> 1.6.0_11-b03.
> > > >> I'm getting the following error:
> > > >>
> > > >> Error occurred during initialization of VM
> > > >> Could not reserve enough space for object heap
> > > >> Could not create the Java virtual machine.
> > > >> make: *** [run] Error 1
> > > >>
> > > >> I searched the forums and found some advice on setting the VM's
> memory
> > > via
> > > >> the javac options
> > > >>
> > > >> -J-Xmx512m or -J-Xms256m
> > > >>
> > > >> I have tried this with various sizes between 128 and 1024 MB.  I am
> > > adding
> > > >> this tag when I compile the source.  This isn't working for me, and
> > > >> allocating 1 GB of memory is a lot for the machine I'm using.  Is
> > there
> > > >> some
> > > >> way to make this work with hadoop?  Is there somewhere else I can
> set
> > > the
> > > >> heap memory?
> > > >>
> > > >> Thanks.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> View this message in context:
> > > >>
> > >
> >
> http://www.nabble.com/Could-not-reserve-enough-space-for-heap-in-JVM-tp22215608p22215608.html
> > > >> Sent from the Hadoop core-user mailing list ar

Re: Could not reserve enough space for heap in JVM

2009-02-25 Thread souravm
Is ur machine 32 bit or 64 bit

- Original Message -
From: Nick Cen 
To: core-user@hadoop.apache.org 
Sent: Wed Feb 25 21:10:00 2009
Subject: Re: Could not reserve enough space for heap in JVM

I got a question relatived to the HADOOP_HEAPSIZE variable. My machine's
memory size is 16G. but when i set HADOOP_HEAPSIZE to 4GB, it thrown the
exception refered in this thread. how can i make full use of my mem. thx.

2009/2/26 Arijit Mukherjee 

> I was getting similar errors too while running the mapreduce samples. I
> fiddled with the hadoop-env.sh (where the HEAPSIZE is specified) and the
> hadoop-site.xml files - and rectified it after some trial and error. But I
> would like to know if there is a thumb rule for this. Right now, I've a
> core
> duo machine with 2GB RAM running on Ubuntu 8.10, and I've found that a
> HEAPSIZE of 256Mb works without any problems. Anything more than that would
> give the same error (even when nothing else is going on in the machine).
>
> Arijit
>
> 2009/2/26 Anum Ali 
>
> > If the solution given my Matei Zaharia wont work , which I guess it
> > wont if you are using eclipse 3.3.0 because this is a bug , which they
> > resloved it in later version which is eclipse 3.4 ganymede. Better
> > upgrade eclipse version.
> >
> >
> >
> >
> > On 2/26/09, Matei Zaharia  wrote:
> > > These variables have to be at runtime through a config file, not at
> > compile
> > > time. You can set them in hadoop-env.sh: Uncomment the line with export
> > > HADOOP_HEAPSIZE= to set the heap size for all Hadoop
> processes,
> > or
> > > change options for specific commands. Now these commands are for the
> > Hadoop
> > > processes themselves, but if you are getting the error in tasks you're
> > > running, you can set these in your hadoop-site.xml through the
> > > mapred.child.java.opts variable, as follows:
> > > 
> > >   mapred.child.java.opts
> > >   -Xmx512m
> > > 
> > >
> > > By the way I'm not sure if -J-Xmx is the right syntax; I've always seen
> > -Xmx
> > > and -Xms.
> > >
> > > On Wed, Feb 25, 2009 at 5:05 PM, madhuri72  wrote:
> > >
> > >>
> > >> Hi,
> > >>
> > >> I'm trying to run hadoop version 19 on ubuntu with java build
> > >> 1.6.0_11-b03.
> > >> I'm getting the following error:
> > >>
> > >> Error occurred during initialization of VM
> > >> Could not reserve enough space for object heap
> > >> Could not create the Java virtual machine.
> > >> make: *** [run] Error 1
> > >>
> > >> I searched the forums and found some advice on setting the VM's memory
> > via
> > >> the javac options
> > >>
> > >> -J-Xmx512m or -J-Xms256m
> > >>
> > >> I have tried this with various sizes between 128 and 1024 MB.  I am
> > adding
> > >> this tag when I compile the source.  This isn't working for me, and
> > >> allocating 1 GB of memory is a lot for the machine I'm using.  Is
> there
> > >> some
> > >> way to make this work with hadoop?  Is there somewhere else I can set
> > the
> > >> heap memory?
> > >>
> > >> Thanks.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >>
> >
> http://www.nabble.com/Could-not-reserve-enough-space-for-heap-in-JVM-tp22215608p22215608.html
> > >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> > >>
> > >>
> > >
> >
>
>
>
> --
> "And when the night is cloudy,
> There is still a light that shines on me,
> Shine on until tomorrow, let it be."
>



-- 
http://daily.appspot.com/food/

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***

RE: Any suggestion on performance improvement ?

2008-11-14 Thread souravm
Hi Alex,

I get 30-40 secs of response time for around 60MB of data. The number of Map 
and Reduce task is 1 each. This is because the default HDFS block size is 64 MB 
and Pig assigns 1 Map task for each HDFS block - I believe that is optimal.

Now this being the unit of performance even if I increase the number of node I 
don't think the performance would be better.

Regards,
Sourav
-Original Message-
From: Alex Loddengaard [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 14, 2008 9:44 AM
To: core-user@hadoop.apache.org
Subject: Re: Any suggestion on performance improvement ?

How big is the data that you're loading and filtering?  Your cluster is
pretty small, so if you have data on the magnitude of tens or hundreds of
GBs, then the performance you're describing is probably to be expected.
How many map and reduce tasks are you running on each node?

Alex

On Thu, Nov 13, 2008 at 4:55 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm testing with a 4 node setup of Hadoop hdfs.
>
> Each node has configuration of 2GB memory and dual core and around 30-60 GB
> disk space.
>
> I've kept files of different sizes in the hdfs ranging from 10MB to 5 GB.
>
> I'm querying those files using PIG. What I'm seeing that even a simple
> select query (LOAD and FILTER) is taking at least 30-40 sec of time. The MAP
> process in one node takes at least 25 sec.
>
> I've kept the jvm max heap size to 1024m.
>
> Any suggestion on how to improve the performance with different
> configuration at Hadoop level (by changing hdfs and MapReduce parameters) ?
>
> Regards,
> Sourav
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>


Any suggestion on performance improvement ?

2008-11-13 Thread souravm
Hi,

I'm testing with a 4 node setup of Hadoop hdfs. 

Each node has configuration of 2GB memory and dual core and around 30-60 GB 
disk space.

I've kept files of different sizes in the hdfs ranging from 10MB to 5 GB.

I'm querying those files using PIG. What I'm seeing that even a simple select 
query (LOAD and FILTER) is taking at least 30-40 sec of time. The MAP process 
in one node takes at least 25 sec.

I've kept the jvm max heap size to 1024m.

Any suggestion on how to improve the performance with different configuration 
at Hadoop level (by changing hdfs and MapReduce parameters) ?

Regards,
Sourav

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


RE: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-17 Thread souravm
Hi Mafish,

Thanks for your suggestions.

Finally I could resolve the issue. The *site.xml in namenode had 
ds.default.name as localhost where as in data nodes it were the actual ip. I 
changed the local host to actual ip in name node and it started working.

Regards,
Sourav

-Original Message-
From: Mafish Liu [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 16, 2008 7:37 PM
To: core-user@hadoop.apache.org
Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX...

Hi, souravm:
  I don't know exactly what's wrong with your configuration from your post
and I guest the possible causes are:

  1. Make sure firewall on namenode is off or the port of 9000 is free to
connect in your firewall configuration.

  2. Namenode. Check the namenode start up log to see if namenode starts up
correctly, or try run 'jps' on your namenode to see if there is process
called "namenode".

May this help.


On Tue, Sep 16, 2008 at 10:41 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Tha namenode in machine 1 has started. I can see the following log. Is
> there a specific way to provide the master name in masters file (in
> hadoop/conf) in datanode ? I've currently specified
>
> 2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=9000
> 2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up
> at: localhost/127.0.0.1:9000
> 2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem:
> fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin
> 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem:
> supergroup=supergroup
> 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem:
> isPermissionEnabled=true
> 2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished
> loading FSImage in 112 msecs
> 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Leaving safe mode after 0 secs.
> 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Network topology has 0 racks and 0 datanodes
> 2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE*
> UnderReplicatedBlocks has 0 blocks
> 2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource
> aliases
> 2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version
> Jetty/5.1.4
> 2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started
> [EMAIL PROTECTED]
> 2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started
> WebApplicationContext[/,/]
> 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started
> HttpContext[/logs,/logs]
> 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started
> HttpContext[/static,/static]
> 2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started
> SocketListener on 0.0.0.0:50070
> 2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started
> [EMAIL PROTECTED]
> 2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server
> up at: 0.0.0.0:50070
> 2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 4 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 7 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 9000: starting
> 2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 9000: starting
>
> Is there a specific way to provide the master name in m

Remote datanode (fully distributed way) is not starting in Mac OSX...

2008-09-16 Thread souravm
Hi.

Any pointer on what could be the problem ?

Regards,
Sourav

From: souravm
Sent: Tuesday, September 16, 2008 1:07 AM
To: 'core-user@hadoop.apache.org'
Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX...

Hi,

I tried the way u suggested. I setup ssh without password. So now namenode can 
connect to datanode without password - the start-dfs.sh script does not ask for 
any password. However, even with this fix I still face the same problem.

Regards,
Sourav

- Original Message -
From: Mafish Liu <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org 
Sent: Mon Sep 15 23:26:10 2008
Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX...

Hi:
  You need to configure your nodes to ensure that node 1 can connect to node
2 without password.

On Tue, Sep 16, 2008 at 2:04 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I'm facing a problem in configuring hdfs in a fully distributed way in Mac
> OSX.
>
> Here is the topology -
>
> 1. The namenode is in machine 1
> 2. There is 1 datanode in machine 2
>
> Now when I execute start-dfs.sh from machine 1, it connects to machine 2
> (after it asks for password for connecting to machine 2) and starts datanode
> in machine 2 (as the console message says).
>
> However -
> 1. When I go to http://machine1:50070 - it does not show the data node at
> all. It says 0 data node configured
> 2. In the log file in machine 2 what I see is -
> /
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = rc0902b-dhcp169.apple.com/17.229.22.169
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.17.2.1
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r
> 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008
> /
> 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 1 time(s).
> 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 2 time(s).
> 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 3 time(s).
> 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 4 time(s).
> 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 5 time(s).
> 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 6 time(s).
> 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 7 time(s).
> 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 8 time(s).
> 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 9 time(s).
> 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 10 time(s).
> 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at /
> 17.229.23.77:9000 not available yet, Z...
>
> ... and this retyring gets on repeating
>
>
> The  hadoop-site.xmls are like this -
>
> 1. In machine 1
> -
> 
>
>  
>fs.default.name
>hdfs://localhost:9000
>  
>
>   
>dfs.name.dir
>/Users/souravm/hdpn
>  
>
>  
>mapred.job.tracker
>localhost:9001
>  
>  
>dfs.replication
>1
>  
> 
>
>
> 2. In machine 2
>
> 
>
>  
>fs.default.name
>hdfs://:9000
>  
>  
>dfs.data.dir
>/Users/nirdosh/hdfsd1
>  
>  
>dfs.replication
>1
>  
> 
>
> The slaves file in machine 1 has single entry - @ machine2>
>
> The exact steps I did -
>
> 1. Reformat the namenode in machine 1
> 2. execute start-dfs.sh in machine 1
> 3. Then I try to see whether the datanode is created through http:// 1 ip>:50070
>
> Any pointer to resolve this issue would be appreciated.
>
> Regards,
> Sourav
>
>
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or di

RE: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-16 Thread souravm
Hi,

Tha namenode in machine 1 has started. I can see the following log. Is there a 
specific way to provide the master name in masters file (in hadoop/conf) in 
datanode ? I've currently specified

2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=NameNode, port=9000
2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: 
localhost/127.0.0.1:9000
2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=NameNode, sessionId=null
2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics: 
Initializing NameNodeMeterics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem: 
fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin
2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: 
supergroup=supergroup
2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: 
isPermissionEnabled=true
2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished 
loading FSImage in 112 msecs
2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Leaving 
safe mode after 0 secs.
2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Network 
topology has 0 racks and 0 datanodes
2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE* 
UnderReplicatedBlocks has 0 blocks
2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered 
FSNamesystemStatusMBean
2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource 
aliases
2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started [EMAIL 
PROTECTED]
2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started 
WebApplicationContext[/,/]
2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started 
HttpContext[/logs,/logs]
2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started 
HttpContext[/static,/static]
2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started 
SocketListener on 0.0.0.0:50070
2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started [EMAIL 
PROTECTED]
2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up 
at: 0.0.0.0:50070
2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 
on 9000: starting
2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 
on 9000: starting

Is there a specific way to provide the master name in masters file (in 
hadoop/conf) in datanode ? I've currently specified @. I'm thinking there might be a problem as in log file of data node I can 
see the message '2008-09-16 14:38:51,501 INFO org.apache.hadoop.ipc.RPC: Server 
at /192.168.1.102:9000 not available yet, Z...'

Any help ?

Regards,
Sourav



From: Samuel Guo [EMAIL PROTECTED]
Sent: Tuesday, September 16, 2008 5:49 AM
To: core-user@hadoop.apache.org
Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX...

check the namenode's log in machine1 to see if your namenode started
successfully :)

On Tue, Sep 16, 2008 at 2:04 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I'm facing a problem in configuring hdfs in a fully distributed way in Mac
> OSX.
>
> Here is the topology -
>
> 1. The namenode is in machine 1
> 2. There is 1 datanode in machine 2
>
> Now when I execute start-dfs.sh from machine 1, it connects to machine 2
> (after it asks for password for connecting to machine 2) and starts datanode
> in machine 2 (as the console message says).
>
> However -
> 1. When I go to http://machine1:50070 - it does not show the data node at
> all. It says 0 data node configured
> 2. In the log file in machine 2 what I see is -
&

Re: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-16 Thread souravm
Hi,

I tried the way u suggested. I setup ssh without password. So now namenode can 
connect to datanode without password - the start-dfs.sh script does not ask for 
any password. However, even with this fix I still face the same problem.

Regards,
Sourav

- Original Message -
From: Mafish Liu <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org 
Sent: Mon Sep 15 23:26:10 2008
Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX...

Hi:
  You need to configure your nodes to ensure that node 1 can connect to node
2 without password.

On Tue, Sep 16, 2008 at 2:04 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I'm facing a problem in configuring hdfs in a fully distributed way in Mac
> OSX.
>
> Here is the topology -
>
> 1. The namenode is in machine 1
> 2. There is 1 datanode in machine 2
>
> Now when I execute start-dfs.sh from machine 1, it connects to machine 2
> (after it asks for password for connecting to machine 2) and starts datanode
> in machine 2 (as the console message says).
>
> However -
> 1. When I go to http://machine1:50070 - it does not show the data node at
> all. It says 0 data node configured
> 2. In the log file in machine 2 what I see is -
> /
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = rc0902b-dhcp169.apple.com/17.229.22.169
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.17.2.1
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r
> 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008
> /
> 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 1 time(s).
> 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 2 time(s).
> 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 3 time(s).
> 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 4 time(s).
> 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 5 time(s).
> 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 6 time(s).
> 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 7 time(s).
> 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 8 time(s).
> 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 9 time(s).
> 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 10 time(s).
> 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at /
> 17.229.23.77:9000 not available yet, Z...
>
> ... and this retyring gets on repeating
>
>
> The  hadoop-site.xmls are like this -
>
> 1. In machine 1
> -
> 
>
>  
>fs.default.name
>hdfs://localhost:9000
>  
>
>   
>dfs.name.dir
>/Users/souravm/hdpn
>  
>
>  
>mapred.job.tracker
>localhost:9001
>  
>  
>dfs.replication
>1
>  
> 
>
>
> 2. In machine 2
>
> 
>
>  
>fs.default.name
>hdfs://:9000
>  
>  
>dfs.data.dir
>/Users/nirdosh/hdfsd1
>  
>  
>dfs.replication
>1
>  
> 
>
> The slaves file in machine 1 has single entry - @ machine2>
>
> The exact steps I did -
>
> 1. Reformat the namenode in machine 1
> 2. execute start-dfs.sh in machine 1
> 3. Then I try to see whether the datanode is created through http:// 1 ip>:50070
>
> Any pointer to resolve this issue would be appreciated.
>
> Regards,
> Sourav
>
>
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a resul

Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-15 Thread souravm
Hi All,

I'm facing a problem in configuring hdfs in a fully distributed way in Mac OSX.

Here is the topology -

1. The namenode is in machine 1
2. There is 1 datanode in machine 2

Now when I execute start-dfs.sh from machine 1, it connects to machine 2 (after 
it asks for password for connecting to machine 2) and starts datanode in 
machine 2 (as the console message says).

However -
1. When I go to http://machine1:50070 - it does not show the data node at all. 
It says 0 data node configured
2. In the log file in machine 2 what I see is -
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = rc0902b-dhcp169.apple.com/17.229.22.169
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.17.2.1
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 684969; 
compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008
/
2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 1 time(s).
2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 2 time(s).
2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 3 time(s).
2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 4 time(s).
2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 5 time(s).
2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 6 time(s).
2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 7 time(s).
2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 8 time(s).
2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 9 time(s).
2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /17.229.23.77:9000. Already tried 10 time(s).
2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at 
/17.229.23.77:9000 not available yet, Z...

... and this retyring gets on repeating


The  hadoop-site.xmls are like this -

1. In machine 1
-


  
fs.default.name
hdfs://localhost:9000
  

   
dfs.name.dir
    /Users/souravm/hdpn
  

  
mapred.job.tracker
localhost:9001
  
  
dfs.replication
1
  



2. In machine 2



 
fs.default.name
hdfs://:9000
  
  
dfs.data.dir
/Users/nirdosh/hdfsd1
  
  
dfs.replication
1
  


The slaves file in machine 1 has single entry - @

The exact steps I did -

1. Reformat the namenode in machine 1
2. execute start-dfs.sh in machine 1
3. Then I try to see whether the datanode is created through http://:50070

Any pointer to resolve this issue would be appreciated.

Regards,
Sourav



 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Accessing input files from different servers

2008-09-12 Thread souravm
Hi,

I would like to process a set of log files (say web server access log) from a 
number of different machines. So I need to get those log files from the 
respective machines to my central HDFS.

To achieve this -
a) Do I need to install hadoop and start reunning HDFS (using start-dfs.sh) in 
all those machines where the log files are getting created ? And then do a file 
get from the central HDFS server` ?
b) Any other way to achive this ?

Regards,
Sourav

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


RE: Why can't Hadoop be used for online applications ?

2008-09-12 Thread souravm
Thanks Ryan for your inputs.

Regards,
Sourav


From: Ryan LeCompte [EMAIL PROTECTED]
Sent: Friday, September 12, 2008 11:55 AM
To: core-user@hadoop.apache.org
Subject: Re: Why can't Hadoop be used for online applications ?

Hadoop is best suited for distributed processing across many machines
of large data sets. Most people use Hadoop to plow through large data
sets in an offline fashion. One approach that you can use is to use
Hadoop to process your data, then put it in an optimized form in HBase
(i.e., similar to Google's Bigtable). Then, you can use HBase for
querying the data in an online-access fashion. Refer to
http://hadoop.apache.org/hbase/ for more information about HBase.

Ryan


On Fri, Sep 12, 2008 at 2:46 PM, souravm <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Here is a bsic doubt.
>
> I found in different documentation it is mentioned that Hadoop is not 
> recommended for online applications. Can anyone please elaborate on the same ?
>
> Regards,
> Sourav
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are 
> not
> to copy, disclose, or distribute this e-mail or its contents to any other 
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has 
> taken
> every reasonable precaution to minimize this risk, but is not liable for any 
> damage
> you may sustain as a result of any virus in this e-mail. You should carry out 
> your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this 
> e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>


Why can't Hadoop be used for online applications ?

2008-09-12 Thread souravm
Hi,

Here is a bsic doubt.

I found in different documentation it is mentioned that Hadoop is not 
recommended for online applications. Can anyone please elaborate on the same ?

Regards,
Sourav

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***