How to configure SWIM

2012-03-01 Thread Arvind
Hi all,
Can anybody help me to configure SWIM -- Statistical Workload Injector for
MapReduce on my hadoop cluster



Re: Datanode abort

2010-03-26 Thread Arvind Sharma
Is the server down for any reason ?  may be system panic and didn't reboot 
itself ?  What OS is the datanode?

The datanode is unreachable on the network...






From: y_823...@tsmc.com y_823...@tsmc.com
To: common-user@hadoop.apache.org
Sent: Fri, March 26, 2010 2:11:50 AM
Subject: Datanode abort

Hi,
My datanode disappeared for unknown reason.
Following is the log, any suggestions would be appreciated!


java.net.SocketTimeoutException: 48 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.81.47.50:50010
remote=/10.81.47.35:34325]
  at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
  at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
  at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
  at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
  at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
  at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
  at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
  at java.lang.Thread.run(Thread.java:619)

2010-03-26 15:53:30,910 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.81.47.50:50010,
storageID=DS-758373957-10.81.47.50-50010-1264018078483, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 48 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.81.47.50:50010
remote=/10.81.47.35:34325]
  at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
  at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
  at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
  at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
  at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
  at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
  at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
  at java.lang.Thread.run(Thread.java:619)




Fleming Chiu(邱宏明)
707-6128
y_823...@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)


--- 
 TSMC PROPERTY  
This email communication (and any attachments) is proprietary information  
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone  
other than the intended
recipient is strictly prohibited.  If you are not the intended recipient,  
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
--- 


  

Re: WritableName can't load class in hive

2010-03-17 Thread Arvind Prabhakar
[cross posting to hive-user]

Oded - how did you create the table in Hive? Did you specify any row format
SerDe for the table? If not, then that may be the cause of this problem
since the default LazySimpleSerDe is unable to deserialize the custom
Writable key value pairs that you have used in your file.

-Arvind

On Tue, Mar 16, 2010 at 2:50 PM, Oded Rotem oded.rotem...@gmail.com wrote:

 Actually, now I moved to this error:

 java.lang.RuntimeException: org.apache.hadoop.hive.serde2.SerDeException:
 class org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: expects either
 BytesWritable or Text object!

 -Original Message-
 From: Alex Kozlov [mailto:ale...@cloudera.com]
 Sent: Tuesday, March 16, 2010 8:02 PM
 To: common-user@hadoop.apache.org
 Subject: Re: WritableName can't load class in hive

 Hive executable will put all jars in HIVE_LIB=${HIVE_HOME}/lib into
 classpath.  Try putting your custom jar into the $HIVE_HOME/lib directory
 and restarting the CLI.

 On Tue, Mar 16, 2010 at 6:28 AM, Oded Rotem oded.rotem...@gmail.com
 wrote:

  Yes, I run the CLI from a folder containing the jar in question.
 
  -Original Message-
  From: Sonal Goyal [mailto:sonalgoy...@gmail.com]
  Sent: Tuesday, March 16, 2010 1:14 PM
  To: common-user@hadoop.apache.org
  Subject: Re: WritableName can't load class in hive
 
  For some custom functions, I put the jar on the local path accessible to
  the
  CLI. Have you tried that?
 
  Thanks and Regards,
  Sonal
 
 
  On Tue, Mar 16, 2010 at 3:49 PM, Oded Rotem oded.rotem...@gmail.com
  wrote:
 
   We have a bunch of sequence files containing keys  values of custom
   Writable classes that we wrote, in a HDFS directory.
  
   We manage to view them using Hadoop fs -text. For further ad-hoc
  analysis,
   we tried using Hive. Managed to load them as external tables in Hive,
   however running a simple select count() against the table fails with
   WritableName can't load class in the job output log.
  
   Executing
  add jar path
   does not solve it.
  
   Where do we need to place the jar containing the definition of the
  writable
   classes?
  
  
 
 




Re: Unexpected termination of a job

2010-03-04 Thread Arvind Sharma
Have you tried after increasing  HEAP memory to your process ?

Arvind






From: Rakhi Khatwani rkhatw...@gmail.com
To: common-user@hadoop.apache.org
Sent: Wed, March 3, 2010 10:38:43 PM
Subject: Re: Unexpected termination of a job

Hi,
   I tried running it on eclipse, the job starts... but somehow it
terminates throwing an exception, Job Failed.
thats why i wanted to run on jobtracker to check the logs but the execution
terminates even before the job starts(during the preprocessing).
How do i ensure that the job runs in jobtracker mode?
Regards
Raakhi
On Thu, Mar 4, 2010 at 2:25 AM, Aaron Kimball aa...@cloudera.com wrote:

 If it's terminating before you even run a job, then you're in luck -- it's
 all still running on the local machine. Try running it in Eclipse and use
 the debugger to trace its execution.

 - Aaron

 On Wed, Mar 3, 2010 at 4:13 AM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:

  Hi,
 I am running a job which has lotta preprocessing involved. so whn
 i
  run my class from a jarfile, somehow it terminates after sometime without
  giving any exception,
  i have tried running the same program several times, and everytime it
  terminates at different locations in the code(during the preprocessing...
  haven't configured a job as yet). Probably it terminaits after a fixed
  interval).
  No idea why this is happeneing, Any Pointers??
  Regards,
  Raakhi Khatwani
 




  

Re: DFSClient write error when DN down

2009-12-04 Thread Arvind Sharma
Any suggestions would be welcome :-)

Arvind







From: Arvind Sharma arvind...@yahoo.com
To: common-user@hadoop.apache.org
Sent: Wed, December 2, 2009 8:02:39 AM
Subject: DFSClient write error when DN down



I have seen similar error logs in the Hadoop Jira (Hadoop-2691, HDFS-795 ) but 
not sure this one is exactly the same scenario.

Hadoop - 0.19.2

The client side DFSClient fails to write when few of the DN in a grid goes 
down.  I see this error :

***

2009-11-13 13:45:27,815 WARN DFSClient | DFSOutputStream
ResponseProcessor exception for block
blk_3028932254678171367_1462691java.io.IOException: Bad response 1 for
block blk_30289322
54678171367_1462691 from datanode 10.201.9.225:50010
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)
2009-11-13 13:45:27,815 WARN DFSClient |  Error Recovery for block 
blk_3028932254678171367_1462691 bad datanode[2] 10.201.9.225:50010 
2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for block
blk_3028932254678171367_1462691 in pipeline 10.201.9.218:50010,
10.201.9.220:50010, 10.201.9.225:50010: bad datanode 10
...201.9.225:50010 
2009-11-13 13:45:37,433 WARN DFSClient | DFSOutputStream
ResponseProcessor exception for block
blk_-6619123912237837733_1462799java.io.IOException: Bad response 1 for
block blk_-661912
3912237837733_1462799 from datanode 10.201.9.225:50010
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)2009-11-13
 13:45:37,433 WARN DFSClient |  Error Recovery for block 
blk_-6619123912237837733_1462799 bad datanode[1] 10.201.9.225:50010 
2009-11-13 13:45:37,433 WARN DFSClient | Error Recovery for block
blk_-6619123912237837733_1462799 in pipeline 10.201.9.218:50010,
10.201.9.225:50010: bad datanode 10.201.9.225:50010


***

The only way I could get my client program to write successfully to the DFS was 
to re-start it.

Any suggestions how to get around this problem on the client side ?  As I 
understood, the DFSClient APIs will take care of situations like this and the 
clients don't need to worry about if some of the DN goes down.

Also, the replication factor is 3 in my setup and there are 10 DN (out of which 
TWO went down)


Thanks!
Arvind


  

Re: DFSClient write error when DN down

2009-12-04 Thread Arvind Sharma
Thanks Todd !

Just wanted another confirmation I guess :-)

Arvind





From: Todd Lipcon t...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Fri, December 4, 2009 8:35:56 AM
Subject: Re: DFSClient write error when DN down

Hi Arvind,

Looks to me like you've identified the JIRAs that are causing this.
Hopefully they will be fixed soon.

-Todd

On Fri, Dec 4, 2009 at 4:43 AM, Arvind Sharma arvind...@yahoo.com wrote:

 Any suggestions would be welcome :-)

 Arvind






 
 From: Arvind Sharma arvind...@yahoo.com
 To: common-user@hadoop.apache.org
 Sent: Wed, December 2, 2009 8:02:39 AM
 Subject: DFSClient write error when DN down



 I have seen similar error logs in the Hadoop Jira (Hadoop-2691, HDFS-795 )
 but not sure this one is exactly the same scenario.

 Hadoop - 0.19.2

 The client side DFSClient fails to write when few of the DN in a grid goes
 down.  I see this error :

 ***

 2009-11-13 13:45:27,815 WARN DFSClient | DFSOutputStream
 ResponseProcessor exception for block
 blk_3028932254678171367_1462691java.io.IOException: Bad response 1 for
 block blk_30289322
 54678171367_1462691 from datanode 10.201.9.225:50010
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)
 2009-11-13 13:45:27,815 WARN DFSClient |  Error Recovery for
 block blk_3028932254678171367_1462691 bad datanode[2] 10.201.9.225:50010
 2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for block
 blk_3028932254678171367_1462691 in pipeline 10.201.9.218:50010,
 10.201.9.220:50010, 10.201.9.225:50010: bad datanode 10
 ...201.9.225:50010
 2009-11-13 13:45:37,433 WARN DFSClient | DFSOutputStream
 ResponseProcessor exception for block
 blk_-6619123912237837733_1462799java.io.IOException: Bad response 1 for
 block blk_-661912
 3912237837733_1462799 from datanode 10.201.9.225:50010
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)2009-11-13
 13:45:37,433 WARN DFSClient |  Error Recovery for block
 blk_-6619123912237837733_1462799 bad datanode[1] 10.201.9.225:50010
 2009-11-13 13:45:37,433 WARN DFSClient | Error Recovery for block
 blk_-6619123912237837733_1462799 in pipeline 10.201.9.218:50010,
 10.201.9.225:50010: bad datanode 10.201.9.225:50010


 ***

 The only way I could get my client program to write successfully to the DFS
 was to re-start it.

 Any suggestions how to get around this problem on the client side ?  As I
 understood, the DFSClient APIs will take care of situations like this and
 the clients don't need to worry about if some of the DN goes down.

 Also, the replication factor is 3 in my setup and there are 10 DN (out of
 which TWO went down)


 Thanks!
 Arvind







  

Re: measuring memory usage

2009-09-09 Thread Arvind Sharma
hmmm... I had seen some exceptions  (don't remember which one) on MacOS. There 
was missing JSR-223 engine on my machine. 

Not sure why on Linux distribution you would see this error





From: Ted Yu yuzhih...@gmail.com
To: common-user@hadoop.apache.org
Sent: Wednesday, September 9, 2009 9:05:57 AM
Subject: Re: measuring memory usage

Linux vh20.dev.com 2.6.18-53.el5 #1 SMP Mon Nov 12 02:22:48 EST 2007
i686 i686 i386 GNU/Linux

/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0

On 9/8/09, Arvind Sharma arvind...@yahoo.com wrote:
 Which OS you are running the command ?  Linux/MacOS/Windows ?

 Which JDK version ?

 Arvind




 
 From: Ted Yu t...@webroot.com
 To: common-user@hadoop.apache.org
 Sent: Friday, September 4, 2009 11:57:44 AM
 Subject: measuring memory usage

 Hi,

 I am using Hadoop 0.20



 How can I get pass the exception below ?



 [had...@vh20 hadoop]$ jmap -heap 3837

 Attaching to process ID 3837, please wait...

 sun.jvm.hotspot.debugger.NoSuchSymbolException: Could not find symbol
 gHotSpotVMTypeEntryTypeNameOffset in any of the known library names
 (libjvm.so, libjvm_g.so, gamma_g)

 at
 sun.jvm.hotspot.HotSpotTypeDataBase.lookupInProcess(HotSpotTypeDataBase.
 java:388)

 at
 sun.jvm.hotspot.HotSpotTypeDataBase.getLongValueFromProcess(HotSpotTypeD
 ataBase.java:369)

 at
 sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java
 :102)

 at
 sun.jvm.hotspot.HotSpotTypeDataBase.init(HotSpotTypeDataBase.java:85)

 at
 sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:568)

 at
 sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494)

 at
 sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332)

 at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)

 at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:39)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
 a:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:616)

 at sun.tools.jmap.JMap.runTool(JMap.java:196)

 at sun.tools.jmap.JMap.main(JMap.java:128)

 Debugger attached successfully.

 sun.jvm.hotspot.tools.HeapSummary requires a java VM process/core!



 Thanks






  

Re: measuring memory usage

2009-09-08 Thread Arvind Sharma
Which OS you are running the command ?  Linux/MacOS/Windows ?

Which JDK version ?

Arvind





From: Ted Yu t...@webroot.com
To: common-user@hadoop.apache.org
Sent: Friday, September 4, 2009 11:57:44 AM
Subject: measuring memory usage

Hi,

I am using Hadoop 0.20



How can I get pass the exception below ?



[had...@vh20 hadoop]$ jmap -heap 3837

Attaching to process ID 3837, please wait...

sun.jvm.hotspot.debugger.NoSuchSymbolException: Could not find symbol
gHotSpotVMTypeEntryTypeNameOffset in any of the known library names
(libjvm.so, libjvm_g.so, gamma_g)

at
sun.jvm.hotspot.HotSpotTypeDataBase.lookupInProcess(HotSpotTypeDataBase.
java:388)

at
sun.jvm.hotspot.HotSpotTypeDataBase.getLongValueFromProcess(HotSpotTypeD
ataBase.java:369)

at
sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java
:102)

at
sun.jvm.hotspot.HotSpotTypeDataBase.init(HotSpotTypeDataBase.java:85)

at
sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:568)

at
sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494)

at
sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332)

at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)

at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:39)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:43)

at java.lang.reflect.Method.invoke(Method.java:616)

at sun.tools.jmap.JMap.runTool(JMap.java:196)

at sun.tools.jmap.JMap.main(JMap.java:128)

Debugger attached successfully.

sun.jvm.hotspot.tools.HeapSummary requires a java VM process/core!



Thanks


  

Re: Copying directories out of HDFS

2009-09-05 Thread Arvind Sharma
They do work on directories as well... 

Arvind





From: Kris Jirapinyo kris.jirapi...@biz360.com
To: common-user@hadoop.apache.org
Sent: Friday, September 4, 2009 11:41:22 PM
Subject: Re: Copying directories out of HDFS

I thought -get and -copyToLocal don't work on directories, only on single
files.

On Fri, Sep 4, 2009 at 9:49 PM, Jeff Zhang zjf...@gmail.com wrote:

 Hi Arvind,

 You miss the fs

 The command should be:

 bin/hadoop fs -get /path/to/dfs/dir  /path/to/local/dir
 or
 bin/hadoop fs -copyToLocal /path/to/dfs/dir  /path/to/local/dir

 The is the link of shell command for your reference.
 http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html



 On Fri, Sep 4, 2009 at 9:09 PM, Arvind Sharma arvind...@yahoo.com wrote:

  You mean programmatically or command line ?
 
  Command line :
 
  bin/hadoop -get /path/to/dfs/dir  /path/to/local/dir
 
  Arvind
 
 
 
 
  
  From: Kris Jirapinyo kjirapi...@biz360.com
  To: common-user common-user@hadoop.apache.org
  Sent: Friday, September 4, 2009 5:15:00 PM
  Subject: Copying directories out of HDFS
 
  Hi all,
 What is the best way to copy directories from HDFS to local disk in
  0.19.1?
 
  Thanks,
  Kris.
 
 
 
 
 




  

Re: Copying directories out of HDFS

2009-09-04 Thread Arvind Sharma
You mean programmatically or command line ?

Command line :

bin/hadoop -get /path/to/dfs/dir  /path/to/local/dir

Arvind





From: Kris Jirapinyo kjirapi...@biz360.com
To: common-user common-user@hadoop.apache.org
Sent: Friday, September 4, 2009 5:15:00 PM
Subject: Copying directories out of HDFS

Hi all,
What is the best way to copy directories from HDFS to local disk in
0.19.1?

Thanks,
Kris.



  

Re: Where does System.out.println() go?

2009-08-24 Thread Arvind Sharma
most of the user level log files goes under $HADOOP_HOME/logs/userlog...try 
there

Arvind





From: Mark Kerzner markkerz...@gmail.com
To: core-u...@hadoop.apache.org
Sent: Monday, August 24, 2009 6:22:50 PM
Subject: Where does System.out.println() go?

Hi,

when I run Hadoop in pseudo-distributed mode, I can't find the log which
System.out.println() goes.

When I run in the IDE, I see it. When I run on EC2, it's part of the output
logs. But here - do I need to set something up?

Thank you,
Mark



  

Re: Getting free space percentage on DFS

2009-08-23 Thread Arvind Sharma
You can try something like this:


if (_FileSystem instanceof DistributedFileSystem)
{
DistributedFileSystem dfs = (DistributedFileSystem) _FileSystems;
DiskStatus ds = dfs.getDiskStatus();
long capacity = ds.getCapacity();
long used = ds.getDfsUsed();
long remaining = ds.getRemaining();
long presentCapacity = used + remaining;

hdfsPercentDiskUsed = Math.round1.0 * used) / presentCapacity) 
* 100));
}



Arvind




From: Stas Oskin stas.os...@gmail.com
To: core-u...@hadoop.apache.org
Sent: Sunday, August 23, 2009 4:22:26 AM
Subject: Getting free space percentage on DFS

Hi.

How can I get the free / used space on DFS, via Java?

What are the functions that can be used for that?

Note, I'm using a regular (non-super) user, so I need to do it in a similar
way to dfshealth.jsp, which AFAIK doesn't require any permissions.

Thanks in advance.



  

Re: Getting free space percentage on DFS

2009-08-23 Thread Arvind Sharma
The APIs work for the user with which Hadoop was started. And moreover I don't 
think the User level authentication is there yet in Hadoop (not sure here 
though) for APIs...





From: Stas Oskin stas.os...@gmail.com
To: common-user@hadoop.apache.org
Sent: Sunday, August 23, 2009 1:33:38 PM
Subject: Re: Getting free space percentage on DFS

Hi.

Thank you both for the advices - any idea if these approaches works for
non-super user?

Regards.



  

Cluster Disk Usage

2009-08-20 Thread Arvind Sharma
Is there a way to find out how much disk space - overall or per Datanode basis 
- is available before creating a file ?

I am trying to address an issue where the disk got full (config error) and the 
client was not able to create a file on the HDFS.

I want to be able to check if  there space left on the grid before trying to 
create the file.

Arvind



  

Re: Cluster Disk Usage

2009-08-20 Thread Arvind Sharma
Using hadoop-0.19.2





From: Arvind Sharma arvind...@yahoo.com
To: common-user@hadoop.apache.org
Sent: Thursday, August 20, 2009 3:56:53 PM
Subject: Cluster Disk Usage

Is there a way to find out how much disk space - overall or per Datanode basis 
- is available before creating a file ?

I am trying to address an issue where the disk got full (config error) and the 
client was not able to create a file on the HDFS.

I want to be able to check if  there space left on the grid before trying to 
create the file.

Arvind


  

Re: Cluster Disk Usage

2009-08-20 Thread Arvind Sharma
Sorry, I also sent a direct e-mail to one response 

there I asked one question - what is the cost of these APIs ???  Are they too 
expensive calls ?  Is the API only going to the NN which stores this data ?

Thanks!
Arvind





From: Arvind Sharma arvind...@yahoo.com
To: common-user@hadoop.apache.org
Sent: Thursday, August 20, 2009 4:01:02 PM
Subject: Re: Cluster Disk Usage

Using hadoop-0.19.2





From: Arvind Sharma arvind...@yahoo.com
To: common-user@hadoop.apache.org
Sent: Thursday, August 20, 2009 3:56:53 PM
Subject: Cluster Disk Usage

Is there a way to find out how much disk space - overall or per Datanode basis 
- is available before creating a file ?

I am trying to address an issue where the disk got full (config error) and the 
client was not able to create a file on the HDFS.

I want to be able to check if  there space left on the grid before trying to 
create the file.

Arvind


  

Re: Hadoop - flush() files

2009-08-18 Thread Arvind Sharma
Just checking again :-)

I have a setup where I am using Hadoop0-19.2 and the data files are kept open 
for a long time. I want them to be sync'd to the HDFS now and then, to avoid 
any data loss.

In one of the last HUG, somebody mentioned  FSDataOutputStream.sync() method. 
But there were some known issues with that.  

Has anyone experienced any problem while using the sync() method ? 

Arvind







Hi,

I was wondering if anyone here have stared using (or has been using) the newer 
Hadoop versions (0-20.1 ??? ) - which provides API for flushing out any open 
files on the HDFS ?

Are there any known issues I should be aware of ?

Thanks!
Arvind


  

Hadoop - flush() files

2009-08-17 Thread Arvind Sharma
Hi,

I was wondering if anyone here have stared using (or has been using) the newer 
Hadoop versions (0-20.1 ??? ) - which provides API for flushing out any open 
files on the HDFS ?

Are there any known issues I should be aware of ?

Thanks!
Arvind



  

Error in starting Pseudo-Distributed mode hadoop-0.19.2

2009-08-14 Thread arvind subramanian
Hi,

I am new to Hadoop, and am trying to get Hadoop started in
Pseudo-distributed mode  on   ubuntu jaunty.

In the archives I noticed that someone had a similar issue with
hadoop-0.20.0,   but the logs are different.

As in the quickstart  guide  (
http://hadoop.apache.org/common/docs/current/quickstart.html) , I
configured the xml files, and set up   passphraseless ssh


The output of  bin/hadoop namenode -format   is as follows :


09/08/13 23:52:49 INFO namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = arvind-laptop/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.19.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.19 -r
789657; compiled by 'root' on Tue Jun 30 12:40:50 EDT 2009
/
Re-format filesystem in /tmp/hadoop-arvind/dfs/name ? (Y or N) Y
09/08/13 23:52:52 INFO namenode.FSNamesystem:
fsOwner=arvind,arvind,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
09/08/13 23:52:52 INFO namenode.FSNamesystem: supergroup=supergroup
09/08/13 23:52:52 INFO namenode.FSNamesystem: isPermissionEnabled=true
09/08/13 23:52:52 INFO common.Storage: Image file of size 96 saved in 0
seconds.
09/08/13 23:52:52 INFO common.Storage: Storage directory
/tmp/hadoop-arvind/dfs/name has been successfully formatted.
09/08/13 23:52:52 INFO namenode.NameNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down NameNode at arvind-laptop/127.0.1.1
/

The task-tracker log had the following error :

2009-08-13 23:11:55,884 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
start task tracker because java.lang.RuntimeException: Not a host:port pair:
local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1318)
at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:884)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2798)



After this point,  I could not acess  these in my browser  :

   - NameNode - http://localhost:50070/
   - JobTracker - http://localhost:50030/

If anyone could give an hint  on what could be the issue, it would be
great!

On another note, the quick start guide pointed to Cloudera's  distribution
of  Hadoop, and they have  debian package for ubuntu.
Is the development plan of Apache and Cloudera same? Do both ship the same
source release?


Cheers,
Arvind


How to re-read the config files

2009-08-13 Thread Arvind Sharma
Hi,

I was wondering if there is way to let Hadoop re-read the config file 
(hadoop-site.xml) after  making some changes in it.

I don't want to restart the whole cluster for that.

I am using Hadoop 0.19.2

Thanks!
Arvind



  

Re: How to re-read the config files

2009-08-13 Thread Arvind Sharma
Sorry,I should have mentioned that - this I want to do without the code change.

Something like - I have the cluster up and running and suddenly I realize that 
forgot to add some properties in the hadoop-site.xml file.  Now I can add these 
new properties - but how do these take into effect ? Without re-starting the 
cluster (which is in production and customer wouldn't like that either :-) )

Thanks!
Arvind





From: Jakob Homan jho...@yahoo-inc.com
To: common-user@hadoop.apache.org
Sent: Thursday, August 13, 2009 2:04:43 PM
Subject: Re: How to re-read the config files

Hey Arvind-
   You'll probably want to look at the Configuration.reload() method, as 
demonstrated:

public class TestReloadConfig {
  public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf); // pull in dfs settings

System.out.println(Replication =  + conf.get(dfs.replication));
System.out.println(Update file and press enter);
new Scanner(System.in).nextLine();

conf.reloadConfiguration();
System.out.println(Now replication =  + conf.get(dfs.replication));
  }
}

Note from the Javadoc: Values that are added via set methods will overlay 
values read from the resources.

Hope this helps.  Write back if you have more questions.

Thanks,
Jakob Homan
Hadoop at Yahoo!


Arvind Sharma wrote:
 Hi,
 
 I was wondering if there is way to let Hadoop re-read the config file 
 (hadoop-site.xml) after  making some changes in it.
 
 I don't want to restart the whole cluster for that.
 
 I am using Hadoop 0.19.2
 
 Thanks!
 Arvind