Re: Can't achieve load distribution
I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Use the NLineInputFormat class. http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html Praveen On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner mark.kerz...@shmsoft.comwrote: Thanks! Mark On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta anilgupt...@gmail.com wrote: Yes, if ur block size is 64mb. Btw, block size is configurable in Hadoop. Best Regards, Anil On Feb 1, 2012, at 5:06 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Anil, do you mean one block of HDFS, like 64MB? Mark On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta anilgupt...@gmail.com wrote: Do u have enough data to start more than one mapper? If entire data is less than a block size then only 1 mapper will run. Best Regards, Anil On Feb 1, 2012, at 4:21 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Hi, I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Each line is 100 characters long, and I tell Hadoop to read only 100 bytes, job.getConfiguration().setInt(mapreduce.input.linerecordreader.line.maxlength, 100); I see that this part works - it reads only one line at a time, and if I change this parameter, it listens. However, on a cluster only one node receives all the map tasks. Only one map tasks is started. The others never get anything, they just wait. I've added 100 seconds wait to the mapper - no change! Any advice? Thank you. Sincerely, Mark
Re: Execute a Map/Reduce Job Jar from Another Java Program.
Moving to common-user. Common-dev is for project development discussions, not user help. Could you elaborate on how you used RunJar? What arguments did you provide, and is the target jar a runnable one or a regular jar? What error did you get? On Thu, Feb 2, 2012 at 8:44 PM, abees muhammad abees...@gmail.com wrote: Hi, I am a newbie to Hadoop Development. I have a Map/Reduce job jar file, i want to execute this jar file programmatically from another java program. I used the following code to execute it. RunJar.main(String[] args). But The jar file is not executed. Can you please give me a work around for this issue. -- View this message in context: http://old.nabble.com/Execute-a-Map-Reduce-Job-Jar-from-Another-Java-Program.-tp33250801p33250801.html Sent from the Hadoop core-dev mailing list archive at Nabble.com. -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Re: org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs
$ sudo rm /var/lib/hadoop-0.20/cache/hdfs/dfs/data And then start DN again regularly, all should be ok. On Thu, Feb 2, 2012 at 5:36 AM, Vijayakumar Ramdoss nellaivi...@gmail.com wrote: Hi All, When I am try to start the datanode, namenode and secondarynodes, its throwing the org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs error messages. I have attached the log files here. hadoop-hdfs-namenode-ubuntu.log --- 2012-02-01 18:49:31,622 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.hdfs.server.namenode.SafeModeException: Checkpoint not created. Name node is in safe mode. The number of live datanodes 0 needs an additional 1 live datanodes to reach the minimum number 1. Safe mode will be turned off automatically. 2012-02-01 18:49:31,622 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 8020, call rollEditLog() from 127.0.0.1:47594: error: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Checkpoint not created. Name node is in safe mode. The number of live datanodes 0 needs an additional 1 live datanodes to reach the minimum number 1. Safe mode will be turned off automatically. org.apache.hadoop.hdfs.server.namenode.SafeModeException: Checkpoint not created. Name node is in safe mode. The number of live datanodes 0 needs an additional 1 live datanodes to reach the minimum number 1. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5095) at org.apache.hadoop.hdfs.server.namenode.NameNode.rollEditLog(NameNode.java:877) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428) hadoop-root-datanode-ubuntu.log --- 2012-02-01 18:49:22,635 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 2012-02-01 18:49:22,822 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 2012-02-01 18:49:23,129 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /var/lib/hadoop-0.20/cache/hdfs/dfs/data: namenode namespaceID = 470535428; datanode namespaceID = 1304806298 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:238) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:153) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:410) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:305) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1606) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1546) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1564) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1690) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1707) 2012-02-01 18:49:23,131 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at ubuntu/127.0.1.1 Thanks and Regards Vijay nellaivi...@gmail.com -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Re: setting multi node in local machine
This conversation ought to help you get started: http://search-hadoop.com/m/a4klk28NUr12 On Thu, Feb 2, 2012 at 9:52 AM, Arun Prakash ckarunprak...@gmail.com wrote: I have windows machine,i am trying to install hadoop with multiple data node like cluster in single machine .Is it possible? Best Regards Arun Prakash C.K Keep On Sharing Your Knowledge with Others -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Re: setting multi node in local machine
If u use VMware and create vm's then you can do it. Best Regards, Anil On Feb 1, 2012, at 8:22 PM, Arun Prakash ckarunprak...@gmail.com wrote: I have windows machine,i am trying to install hadoop with multiple data node like cluster in single machine .Is it possible? Best Regards Arun Prakash C.K Keep On Sharing Your Knowledge with Others
Re: Execute a Map/Reduce Job Jar from Another Java Program.
What happens? Is there an exception, does nothing happen? I am curious. Also how did you launch your other job that is trying to run this one. The hadoop script sets up a lot of environment variables classpath etc to make hadoop work properly, and some of that may not be set up correctly to make RunJar work. --Bobby Evans On 2/2/12 9:36 AM, Harsh J ha...@cloudera.com wrote: Moving to common-user. Common-dev is for project development discussions, not user help. Could you elaborate on how you used RunJar? What arguments did you provide, and is the target jar a runnable one or a regular jar? What error did you get? On Thu, Feb 2, 2012 at 8:44 PM, abees muhammad abees...@gmail.com wrote: Hi, I am a newbie to Hadoop Development. I have a Map/Reduce job jar file, i want to execute this jar file programmatically from another java program. I used the following code to execute it. RunJar.main(String[] args). But The jar file is not executed. Can you please give me a work around for this issue. -- View this message in context: http://old.nabble.com/Execute-a-Map-Reduce-Job-Jar-from-Another-Java-Program.-tp33250801p33250801.html Sent from the Hadoop core-dev mailing list archive at Nabble.com. -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Re: Can't achieve load distribution
Praveen, this seems just like the right thing, but it's API 0.21 (I googled about the problems with it), so I have to use either the next Cloudera release, or Hortonworks, or something, am I right? Mark On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati praveensrip...@gmail.comwrote: I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Use the NLineInputFormat class. http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html Praveen On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Thanks! Mark On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta anilgupt...@gmail.com wrote: Yes, if ur block size is 64mb. Btw, block size is configurable in Hadoop. Best Regards, Anil On Feb 1, 2012, at 5:06 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Anil, do you mean one block of HDFS, like 64MB? Mark On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta anilgupt...@gmail.com wrote: Do u have enough data to start more than one mapper? If entire data is less than a block size then only 1 mapper will run. Best Regards, Anil On Feb 1, 2012, at 4:21 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Hi, I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Each line is 100 characters long, and I tell Hadoop to read only 100 bytes, job.getConfiguration().setInt(mapreduce.input.linerecordreader.line.maxlength, 100); I see that this part works - it reads only one line at a time, and if I change this parameter, it listens. However, on a cluster only one node receives all the map tasks. Only one map tasks is started. The others never get anything, they just wait. I've added 100 seconds wait to the mapper - no change! Any advice? Thank you. Sincerely, Mark
Any ever run Hadoop on Tier 3?
I'm looking for any experience I can benefit from in running Hadoop on Tier 3? I'll probably be using Cloudera's CDH3, which provides Hadoop-0.20.2+923.194. Would anyone anticipate any gotchas with such a scenario? Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com The easy confidence with which I know another man's religion is folly teaches me to suspect that my own is also. -- Mark Twain
[ANNOUNCE] Apache MRUnit 0.8.0-incubating released
The Apache MRUnit team is pleased to announce the release of MRUnit 0.8.0-incubating from the Apache Incubator. This is the second release of Apache MRUnit, a Java library that helps developers unit test Apache Hadoop map reduce jobs. The release is available here: http://www.apache.org/dyn/closer.cgi/incubator/mrunit/ The full change log is available here: https://issues.apache.org/jira/browse/MRUNIT/fixforversion/12316359 We welcome your help and feedback. For more information on how to report problems, and to get involved, visit the project website at http://incubator.apache.org/mrunit/ The Apache MRUnit Team
Re: Can't achieve load distribution
Mark, NLineInputFormat was not something which was introduced in 0.21, I have just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and 0.23 releases also. Praveen On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner mark.kerz...@shmsoft.comwrote: Praveen, this seems just like the right thing, but it's API 0.21 (I googled about the problems with it), so I have to use either the next Cloudera release, or Hortonworks, or something, am I right? Mark On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati praveensrip...@gmail.com wrote: I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Use the NLineInputFormat class. http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html Praveen On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Thanks! Mark On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta anilgupt...@gmail.com wrote: Yes, if ur block size is 64mb. Btw, block size is configurable in Hadoop. Best Regards, Anil On Feb 1, 2012, at 5:06 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Anil, do you mean one block of HDFS, like 64MB? Mark On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta anilgupt...@gmail.com wrote: Do u have enough data to start more than one mapper? If entire data is less than a block size then only 1 mapper will run. Best Regards, Anil On Feb 1, 2012, at 4:21 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Hi, I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Each line is 100 characters long, and I tell Hadoop to read only 100 bytes, job.getConfiguration().setInt(mapreduce.input.linerecordreader.line.maxlength, 100); I see that this part works - it reads only one line at a time, and if I change this parameter, it listens. However, on a cluster only one node receives all the map tasks. Only one map tasks is started. The others never get anything, they just wait. I've added 100 seconds wait to the mapper - no change! Any advice? Thank you. Sincerely, Mark
Re: Can't achieve load distribution
New API NLineInputFormat is only available from 1.0.1, and not in any of the earlier 1 (1.0.0) or 0.20 (0.20.x, 0.20.xxx) vanilla Apache releases. On Fri, Feb 3, 2012 at 7:08 AM, Praveen Sripati praveensrip...@gmail.com wrote: Mark, NLineInputFormat was not something which was introduced in 0.21, I have just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and 0.23 releases also. Praveen On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner mark.kerz...@shmsoft.comwrote: Praveen, this seems just like the right thing, but it's API 0.21 (I googled about the problems with it), so I have to use either the next Cloudera release, or Hortonworks, or something, am I right? Mark On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati praveensrip...@gmail.com wrote: I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Use the NLineInputFormat class. http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html Praveen On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Thanks! Mark On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta anilgupt...@gmail.com wrote: Yes, if ur block size is 64mb. Btw, block size is configurable in Hadoop. Best Regards, Anil On Feb 1, 2012, at 5:06 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Anil, do you mean one block of HDFS, like 64MB? Mark On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta anilgupt...@gmail.com wrote: Do u have enough data to start more than one mapper? If entire data is less than a block size then only 1 mapper will run. Best Regards, Anil On Feb 1, 2012, at 4:21 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Hi, I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Each line is 100 characters long, and I tell Hadoop to read only 100 bytes, job.getConfiguration().setInt(mapreduce.input.linerecordreader.line.maxlength, 100); I see that this part works - it reads only one line at a time, and if I change this parameter, it listens. However, on a cluster only one node receives all the map tasks. Only one map tasks is started. The others never get anything, they just wait. I've added 100 seconds wait to the mapper - no change! Any advice? Thank you. Sincerely, Mark -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Re: Can't achieve load distribution
And that is exactly what I found. I have a hack for now - give all files on the command line - and I will wait for the next release in some distribution. Thank you, Mark On Thu, Feb 2, 2012 at 9:55 PM, Harsh J ha...@cloudera.com wrote: New API NLineInputFormat is only available from 1.0.1, and not in any of the earlier 1 (1.0.0) or 0.20 (0.20.x, 0.20.xxx) vanilla Apache releases. On Fri, Feb 3, 2012 at 7:08 AM, Praveen Sripati praveensrip...@gmail.com wrote: Mark, NLineInputFormat was not something which was introduced in 0.21, I have just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and 0.23 releases also. Praveen On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Praveen, this seems just like the right thing, but it's API 0.21 (I googled about the problems with it), so I have to use either the next Cloudera release, or Hortonworks, or something, am I right? Mark On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati praveensrip...@gmail.com wrote: I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Use the NLineInputFormat class. http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html Praveen On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Thanks! Mark On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta anilgupt...@gmail.com wrote: Yes, if ur block size is 64mb. Btw, block size is configurable in Hadoop. Best Regards, Anil On Feb 1, 2012, at 5:06 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Anil, do you mean one block of HDFS, like 64MB? Mark On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta anilgupt...@gmail.com wrote: Do u have enough data to start more than one mapper? If entire data is less than a block size then only 1 mapper will run. Best Regards, Anil On Feb 1, 2012, at 4:21 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Hi, I have a simple MR job, and I want each Mapper to get one line from my input file (which contains further instructions for lengthy processing). Each line is 100 characters long, and I tell Hadoop to read only 100 bytes, job.getConfiguration().setInt(mapreduce.input.linerecordreader.line.maxlength, 100); I see that this part works - it reads only one line at a time, and if I change this parameter, it listens. However, on a cluster only one node receives all the map tasks. Only one map tasks is started. The others never get anything, they just wait. I've added 100 seconds wait to the mapper - no change! Any advice? Thank you. Sincerely, Mark -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
2 Failures in unit test of Hadoop 1.0.0 version
Hi all, Downloading the release version Hadoop 1.0.0. from the apache web site. After several hours' test period(by executing ant test), I found there are totally 2 failures in TestCLI(60 failed) and TestSaslRPC cases respectively. I wonder if there is any missing steps for me. Is there someone having the same problem? could you please help me with it? Thanks a lot for your time. -Grace
Problem when starting datanode
hey folks , I m getting an when i starting my datanode. can any1 have the idea what this error about. 2012-02-03 11:57:02,947 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.0.3.31:50010, storageID=DS-1677953808-10.0.3.31-50010-1318330317888, infoPort=50075, ipcPort=50020):DataXceiver java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.updateScanStatus(DataBlockScanner.java:308) at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifiedByClient(DataBlockScanner.java:302) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:188) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) 2012-02-03 11:57:05,102 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / 10.0.3.31:50010, dest: /10.0.5.36:48862, bytes: 67141632, op: HDFS_READ, cliID: DFSClient_attempt_201201271626_5092_m_44_0, srvID: DS-1677953808-10.0.3.31-50010-1318330317888, blockid: blk_-1065827417218117592_10663233 2 regards Vikas Srivastava
10 nodes how to build the topological graph
Hi , I have 10 machines to build a small cluster, how to build the topological graph between these machines? I mean how to build the rack use 10 machines. Thanks ! -Rock The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return e-mail and destroy all copies of the original message.
Re: 10 nodes how to build the topological graph
Hi Rock, you mean rack awareness. http://hadoop.apache.org/common/docs/r0.17.2/hdfs_user_guide.html#Rack+Awareness Here you find examples: http://wiki.apache.org/hadoop/topology_rack_awareness_scripts best, Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 3, 2012, at 8:03 AM, Jinyan Xu wrote: Hi , I have 10 machines to build a small cluster, how to build the topological graph between these machines? I mean how to build the rack use 10 machines. Thanks ! -Rock The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return e-mail and destroy all copies of the original message.