Re: my hadoop job failed sometimes
Thanks Harsh :) the hadoop system started 3 months ago. so i think it is not in safe mode. i found some old tasks started 10 days ago, the tasks seem blocked by some unknown reason. I killed these tasks now. but i don't know why a task can be blocked and exists so long. I found another type of exception like this in namenode's node. 2011-08-16 11:25:44,352 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201106021431_1704_r_000506_1: org.ap ache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /walter/send_album s/110816_111850/_temporary/_attempt_201106021431_1704_r_000506_1/part-r-00506 File does not exist. Holder DFSClient_attempt_2 01106021431_1704_r_000506_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1332) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1251) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2011/8/17, Harsh J : > Do you notice anything related in the NameNode logs? One reason for > this is that the NameNode may be in safe mode for some reason, but > there are many other reasons so the NameNode's log would be the best > place to look for exactly why the complete()-op fails. > > On Wed, Aug 17, 2011 at 8:20 AM, Jianxin Wang wrote: >> hi, >> my job runs once ervey day. but it failed sometimes. >> i checked the log in job tracker. It seems a hdfs error? >> thanks a lot! >> >> 2011-08-16 21:07:13,247 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error from attempt_201106021431_1719_r_000498_1: org.ap >> ache.hadoop.ipc.RemoteException: java.io.IOException: Could not >> complete write to file /xxx/yyy/110816_210016/_tem >> porary/_attempt_201106021431_1719_r_000498_1/part-r-00498 by >> DFSClient_attempt_201106021431_1719_r_000498_1 >> at >> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449) >> at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) >> >> at org.apache.hadoop.ipc.Client.call(Client.java:740) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) >> at $Proxy1.complete(Unknown Source) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) >> at $Proxy1.complete(Unknown Source) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188) >> at >> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) >> at >> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) >> at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:959) >> at >> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1290) >> at >> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.close(Sequenc
Re: my hadoop job failed sometimes
Do you notice anything related in the NameNode logs? One reason for this is that the NameNode may be in safe mode for some reason, but there are many other reasons so the NameNode's log would be the best place to look for exactly why the complete()-op fails. On Wed, Aug 17, 2011 at 8:20 AM, Jianxin Wang wrote: > hi, > my job runs once ervey day. but it failed sometimes. > i checked the log in job tracker. It seems a hdfs error? > thanks a lot! > > 2011-08-16 21:07:13,247 INFO org.apache.hadoop.mapred.TaskInProgress: > Error from attempt_201106021431_1719_r_000498_1: org.ap > ache.hadoop.ipc.RemoteException: java.io.IOException: Could not > complete write to file /xxx/yyy/110816_210016/_tem > porary/_attempt_201106021431_1719_r_000498_1/part-r-00498 by > DFSClient_attempt_201106021431_1719_r_000498_1 > at > org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449) > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > > at org.apache.hadoop.ipc.Client.call(Client.java:740) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy1.complete(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy1.complete(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) > at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:959) > at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1290) > at > org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:78) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > -- Harsh J
Re: NameNode not registering DataNodes for unknown reason
Adam, You've run into a fairly common issue with the 0.20.x release, that of /dev/random being used by the DataNode daemon upon startup, which blocks the input calls until it has enough data to give back. Usually, if your DN machines have some other activity (mouse, keyboard on terminal, etc.), the DNs would get unwedged and start after a while. But this could take a lot of time on a remote single-purposed node such as yours (as there's no activity). As a workaround, you could add this to the JVM args, and it would work. The '/dev/../dev/urandom' is required as if you point it at /dev/urandom, java ignores your setting. (Hat tip: Brock Noland) -Djava.security.egd=file:/dev/../dev/urandom (Strace that confirms this): "main" prio=10 tid=0x01dda800 nid=0xcda runnable [0x41257000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:236) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0xed1f2958> (a java.io.BufferedInputStream) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0xed1f27b8> (a java.io.BufferedInputStream) at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469) at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188) - locked <0xed1f23d8> (a sun.security.provider.SecureRandom) at java.security.SecureRandom.nextBytes(SecureRandom.java:450) - locked <0xed1f2678> (a java.security.SecureRandom) at java.security.SecureRandom.next(SecureRandom.java:472) at java.util.Random.nextInt(Random.java:272) For some further discussions on this, see/follow: http://search-hadoop.com/m/pgnuO1wbQRQ Btw, this has been fixed in trunk @ https://issues.apache.org/jira/browse/HDFS-1835 On Wed, Aug 17, 2011 at 1:18 AM, Adam DeConinck wrote: > Hi all, > > I've been seeing an HDFS issue I don't understand, and I'm hoping > someone else has seen this before. > > I'm currently attempting to set up a simple-stupid Hadoop 0.20.203.0 > test cluster on two Dell PE1950s running a minimal installation of RHEL > 5.6. The master node, wd0031, is running a NameNode, DataNode and > SecondaryNameNode. A single slave node, wd0032, is running a DataNode. > The Hadoop processes are starting up fine and I'm not seeing any errors > in the log files; but the DataNodes never join the filesystem. There > are never any log entries in the NameNode about their registration, > doing a "hadoop fsck /" lists zero data-nodes, and I can't write files. > The config and log files, and some ngrep traces, are up on > https://gist.github.com/1149869 . > > What's weird is that exactly the same configuration works on a two-node > EC2 cluster running CentOS 5.6: the filesystem works, fsck lists the > datanodes, and the logs show the right entries. See > https://gist.github.com/1149823 . As far as I can tell there should be > no difference between these cases. > > Jstack traces on a DataNode and NameNode for both cases, local and EC2, > are here: https://gist.github.com/1149843 > > I'm a relative newbie to Hadoop, and I cannot figure out why I'm having > this problem on local hardware but not EC2. There's nothing in the logs > or the jstacks which is obvious to me, but hopefully someone who knows > Hadoop better can let me know. > > Please feel free to let me know if you need more information. > > Thanks, > Adam > > -- > Adam DeConinck | Applications Specialist | adeconi...@rsystemsinc.com > > Enabling Innovation Through Fast and Flexible HPC Resources > > R Systems NA, inc. | 1902 Fox Drive, Champaign, IL 61820 | 217.954.1056 | > www.rsystemsinc.com > > > -- Harsh J
my hadoop job failed sometimes
hi, my job runs once ervey day. but it failed sometimes. i checked the log in job tracker. It seems a hdfs error? thanks a lot! 2011-08-16 21:07:13,247 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201106021431_1719_r_000498_1: org.ap ache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write to file /xxx/yyy/110816_210016/_tem porary/_attempt_201106021431_1719_r_000498_1/part-r-00498 by DFSClient_attempt_201106021431_1719_r_000498_1 at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.complete(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.complete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:959) at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1290) at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:78) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170)
Re: Why hadoop should be built on JAVA?
This should explain it http://jz10.java.no/java-4-ever-trailer.html . On Tue, Aug 16, 2011 at 1:17 PM, Adi wrote: > > > > > > > On Mon, Aug 15, 2011 at 9:00 PM, Chris Song wrote: > > > > > > > Why hadoop should be built in JAVA? > > > > > > > > For integrity and stability, it is good for hadoop to be implemented > in > > > > Java > > > > > > > > But, when it comes to speed issue, I have a question... > > > > > > > > How will it be if HADOOP is implemented in C or Phython? > > > > > > > > I haven't used anything besides hadoop but in case you are interested in > alternate (some of them non-java) M/R frameworks this list is a decent > compilation of those > > https://sites.google.com/site/cloudcomputingsystem/research/programming-model > > Erlang/Python - http://discoproject.org/ > Ruby - http://skynet.rubyforge.org/ > > -Adi >
Re: Hadoop Meetup in Sept in Shanghai
Hi, I'm interested in it, any more details? Best, Nan On Wed, Aug 17, 2011 at 7:42 AM, Michael Lv wrote: > Hi, > > We plan to organize a developer meetup to talk about Hadoop and big data > during the week of Sept 12 in Shanghai. We'll have presenters from U.S and > the topic looks very interesting. Suggestions and presentation by guest are > welcome. > > If you are interested to attend, please reply to this thread or contact me > directly. > > Regards, > Michael > > > -- Nan Zhu School of Electronic, Information and Electrical Engineering,229 Shanghai Jiao Tong University 800,Dongchuan Road,Shanghai,China E-Mail: zhunans...@gmail.com
Re:Hadoop Meetup in Sept in Shanghai
I'd like to attend, like to hear more about hive At 2011-08-17 07:42:07,"Michael Lv" wrote: >Hi, > >We plan to organize a developer meetup to talk about Hadoop and big data >during the week of Sept 12 in Shanghai. We'll have presenters from U.S and >the topic looks very interesting. Suggestions and presentation by guest are >welcome. > >If you are interested to attend, please reply to this thread or contact me >directly. > >Regards, >Michael > >
Hadoop Meetup in Sept in Shanghai
Hi, We plan to organize a developer meetup to talk about Hadoop and big data during the week of Sept 12 in Shanghai. We'll have presenters from U.S and the topic looks very interesting. Suggestions and presentation by guest are welcome. If you are interested to attend, please reply to this thread or contact me directly. Regards, Michael
Re: WritableComparable
Can you copy the contents of your parent Writable readField and write methods (not the ones youve already posted) Another thing you could try is if you know you have two identical keys, can you write a unit test to examine the result of compareTo for two instances to confirm the correct behavior (even going as far as serializing and deserializing before the comparison) Finally just to confirm, you dont have any group or order comparators registered?
NameNode not registering DataNodes for unknown reason
Hi all, I've been seeing an HDFS issue I don't understand, and I'm hoping someone else has seen this before. I'm currently attempting to set up a simple-stupid Hadoop 0.20.203.0 test cluster on two Dell PE1950s running a minimal installation of RHEL 5.6. The master node, wd0031, is running a NameNode, DataNode and SecondaryNameNode. A single slave node, wd0032, is running a DataNode. The Hadoop processes are starting up fine and I'm not seeing any errors in the log files; but the DataNodes never join the filesystem. There are never any log entries in the NameNode about their registration, doing a "hadoop fsck /" lists zero data-nodes, and I can't write files. The config and log files, and some ngrep traces, are up on https://gist.github.com/1149869 . What's weird is that exactly the same configuration works on a two-node EC2 cluster running CentOS 5.6: the filesystem works, fsck lists the datanodes, and the logs show the right entries. See https://gist.github.com/1149823 . As far as I can tell there should be no difference between these cases. Jstack traces on a DataNode and NameNode for both cases, local and EC2, are here: https://gist.github.com/1149843 I'm a relative newbie to Hadoop, and I cannot figure out why I'm having this problem on local hardware but not EC2. There's nothing in the logs or the jstacks which is obvious to me, but hopefully someone who knows Hadoop better can let me know. Please feel free to let me know if you need more information. Thanks, Adam -- Adam DeConinck | Applications Specialist | adeconi...@rsystemsinc.com Enabling Innovation Through Fast and Flexible HPC Resources R Systems NA, inc. | 1902 Fox Drive, Champaign, IL 61820 | 217.954.1056 | www.rsystemsinc.com
Re: How do I add Hadoop dependency to a Maven project?
If you're talking about the org.apache.hadoop.mapreduce.* API, that was introduced in 0.20.0. There should be no need to use the 0.21 version. -Joey On Tue, Aug 16, 2011 at 1:22 PM, W.P. McNeill wrote: > Here is my specific problem: > > I have a sample word count Hadoop program up on github ( > https://github.com/wpm/WordCountTestAdapter) that illustrates unit testing > techniques for Hadoop. This code uses the new API. (On my development > machine I'm using version 0.20.2) I want to use Maven for its build > framework because that seems like the way the Java world is going. Currently > the pom.xml for this project makes no mention of Hadoop. If you try to do a > "mvn install" you get the errors I described earlier. I want to change this > project so that "mvn install" builds it. > > I can find the pre-0.21 (old API) hadoop-core JARs on > http://mvnrepository.com, but I can't find the post-0.21 (new API) > hadoop-mapred here. Do I need to add another Maven repository server to get > the new API JARs? > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: How do I add Hadoop dependency to a Maven project?
Here is my specific problem: I have a sample word count Hadoop program up on github ( https://github.com/wpm/WordCountTestAdapter) that illustrates unit testing techniques for Hadoop. This code uses the new API. (On my development machine I'm using version 0.20.2) I want to use Maven for its build framework because that seems like the way the Java world is going. Currently the pom.xml for this project makes no mention of Hadoop. If you try to do a "mvn install" you get the errors I described earlier. I want to change this project so that "mvn install" builds it. I can find the pre-0.21 (old API) hadoop-core JARs on http://mvnrepository.com, but I can't find the post-0.21 (new API) hadoop-mapred here. Do I need to add another Maven repository server to get the new API JARs?
Re: Why hadoop should be built on JAVA?
> > > > On Mon, Aug 15, 2011 at 9:00 PM, Chris Song wrote: > > > > > Why hadoop should be built in JAVA? > > > > > > For integrity and stability, it is good for hadoop to be implemented in > > > Java > > > > > > But, when it comes to speed issue, I have a question... > > > > > > How will it be if HADOOP is implemented in C or Phython? > > > > I haven't used anything besides hadoop but in case you are interested in alternate (some of them non-java) M/R frameworks this list is a decent compilation of those https://sites.google.com/site/cloudcomputingsystem/research/programming-model Erlang/Python - http://discoproject.org/ Ruby - http://skynet.rubyforge.org/ -Adi
Re: Why hadoop should be built on JAVA?
Java's features such as garbage collection, run time array index checking, cleaner syntax (no pointers) make it a good language for Hadoop. One can develop MapReduce apps faster and maintain code easier than in case of C/C++, allowing clients to focus on their business logic/use cases. For a fairly high level implementation of MapReduce which uses clusters of COTS hardware as compute nodes, the main bottleneck in most applications will be due to network I/O. In such cases, the speed advantage of C/C++ over Java seems less attractive. You will be doing more work shuffling packets around anyway. C/C++ applications are difficult to port, and are too system specific. Let's say you are trying to optimize a certain portion of your mapper code by pointer manipulations. Such operations are inherently error prone because of their proximity to the hardware. JVM alleviates most of these issues, you don't have to think about what is the number of bytes for a double, your code will be portable across 32 bit or 64 bit architectures, across all endian systems etc. Even with Java's safety and comfort, debugging distributed Hadoop MapReduce apps are a pain in the butt. Just imagine what would happen if you had C/C++ where you are buried in Seg Faults. I would say that you can use C/C++ to implement MapReduce, if you were using multicore/GPU's as your underlying platform where you know the hardware initimately and are free from network I/O latency. -Dhruv Kumar On Tue, Aug 16, 2011 at 12:05 PM, Bill Graham wrote: > There was a fairly long discussion on this topic at the beginning of the > year FYI: > > http://search-hadoop.com/m/JvSQe2wNlY11 > > On Mon, Aug 15, 2011 at 9:00 PM, Chris Song wrote: > > > Why hadoop should be built in JAVA? > > > > For integrity and stability, it is good for hadoop to be implemented in > > Java > > > > But, when it comes to speed issue, I have a question... > > > > How will it be if HADOOP is implemented in C or Phython? > > >
Re: Why hadoop should be built on JAVA?
There was a fairly long discussion on this topic at the beginning of the year FYI: http://search-hadoop.com/m/JvSQe2wNlY11 On Mon, Aug 15, 2011 at 9:00 PM, Chris Song wrote: > Why hadoop should be built in JAVA? > > For integrity and stability, it is good for hadoop to be implemented in > Java > > But, when it comes to speed issue, I have a question... > > How will it be if HADOOP is implemented in C or Phython? >
Re: How do I add Hadoop dependency to a Maven project?
Just to make sure I understand, the drop of smartfrog.svn.sourceforge.net is just a build of the latest Hadoop JARs, right? I can't use it as a Maven repository (because it's POM-less).
Re: hadoop cluster mode not starting up
See inline: > >From: shanmuganathan.r >To: common-user@hadoop.apache.org >Sent: Tuesday, 16 August 2011, 13:35 >Subject: Re: hadoop cluster mode not starting up > >Hi Df, > > Are you use the IP instead of names in conf/masters and conf/slaves . >For running the secondary namenode in separate machine refer the following >link > > >=Yes, I use the names in those files but the ip address are mapped to the >names in the /extras/hosts file. Does this cause problems? > > >http://www.hadoop-blog.com/2010/12/secondarynamenode-process-is-starting.html > > >=I want to making too many changes so I will stick to having the master be >both namenode and secondarynamenode. I tried starting up the hdfs and >mapreduce but the jobtracker is not running on the master and their is still >errors regarding the datanodes because only 5 of 7 datanodes have tasktracker. >I ran both commands for to start the hdfs and mapreduce so why is the >jobtracker missing? > >Regards, > >Shanmuganathan > > > > On Tue, 16 Aug 2011 17:06:04 +0530 A >Dfwrote > > >I already used a few tutorials as follows: > * Hadoop Tutorial on Yahoo Developer network which uses an old hadoop and >thus older conf files. > > * >http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ > which only has two nodes and the master acts as namenode and secondary >namenode. I need one with more than that. > > >Is there a way to prevent the node from using the central file system because >I don't have root permission and my user folder is in a central file system >which is replicated on all the nodes? > >See inline too for my responses > > > >> >>From: Steve Loughran >>To: common-user@hadoop.apache.org >>Sent: Tuesday, 16 August 2011, 12:08 >>Subject: Re: hadoop cluster mode not starting up >> >>On 16/08/11 11:19, A Df wrote: >>> See inline >>> >>> >>> From: Steve Loughran To: common-user@hadoop.apache.org Sent: Tuesday, 16 August 2011, 11:08 Subject: Re: hadoop cluster mode not starting up On 16/08/11 11:02, A Df wrote: > Hello All: > > I used a combination of tutorials to setup hadoop but most >seems to be using either an old version of hadoop or only using 2 machines for >the cluster which isn't really a cluster. Does anyone know of a good tutorial >which setups multiple nodes for a cluster?? I already looked at the Apache >website but it does not give sample values for the conf files. Also each set >of tutorials seem to have a different set of parameters which they indicate >should be changed so now its a bit confusing. For example, my configuration >sets a dedicate namenode, secondary namenode and 8 slave nodes but when I run >the start command it gives an error. Should I install hadoop to my user >directory or on the root? I have it in my directory but all the nodes have a >central file system as opposed to distributed so whatever I do on one node in >my user folder it affect all the others so how do i set the paths to ensure >that it uses a distributed system? > > For the errors below, I checked the directories and the files >are there. Am I not sure what went wrong and how to set the conf to not have >central file system. Thank you. > > Error message > CODE > w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh > bin/start-dfs.sh: line 28: >/w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or directory > bin/start-dfs.sh: line 50: >/w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or directory > bin/start-dfs.sh: line 51: >/w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or >directory > bin/start-dfs.sh: line 52: >/w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or >directory > CODE there's No such file or directory as /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh There is, I checked as shown w1153435@n51:~/hadoop-0.20.2_cluster> ls bin hadoop rcc start-dfs.sh stop-dfs.sh hadoop-config.sh slaves.sh start-mapred.sh >stop-mapred.sh hadoop-daemon.sh start-all.sh stop-all.sh hadoop-daemons.sh start-balancer.sh stop-balancer.sh >> >>try "pwd" to print out where the OS thinks you are, as it doesn't seem >>to be where you think you are >> >> >>w1153435@ngs:~/hadoop-0.20.2_cluster> pwd >>/home/w1153435/hadoop-0.20.2_cluster >> >> >>w1153435@ngs:~/hadoop-0.20.2_cluster/bin> pwd >>/home/w1153435/hadoop-0.20.2_cluster/bin >> > > I had tried running this command below earlier but also got >problems: > CODE > w1153435@ngs:~/hadoop-0.20.2_cluster> export >HADOOP_CONF_DIR=${HADOOP_HOME}/conf > w1
Re: WritableComparable
On Tue, Aug 16, 2011 at 6:14 AM, Chris White wrote: > Are you using a hash partioner? If so make sure the hash value of the > writable is not calculated using the hashCode value of the enum - use the > ordinal value instead. The hashcode value of an enum is different for each > jvm. > Thanks for the tip. I am using a hash partitioner (its the default) but my hash value is not based on an enum value. In any case, the keys in question get hashed to the same reducer. Best, stan
Re: hadoop cluster mode not starting up
Hi Df, Are you use the IP instead of names in conf/masters and conf/slaves . For running the secondary namenode in separate machine refer the following link http://www.hadoop-blog.com/2010/12/secondarynamenode-process-is-starting.html Regards, Shanmuganathan On Tue, 16 Aug 2011 17:06:04 +0530 A Dfwrote I already used a few tutorials as follows: * Hadoop Tutorial on Yahoo Developer network which uses an old hadoop and thus older conf files. * http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ which only has two nodes and the master acts as namenode and secondary namenode. I need one with more than that. Is there a way to prevent the node from using the central file system because I don't have root permission and my user folder is in a central file system which is replicated on all the nodes? See inline too for my responses > >From: Steve Loughran >To: common-user@hadoop.apache.org >Sent: Tuesday, 16 August 2011, 12:08 >Subject: Re: hadoop cluster mode not starting up > >On 16/08/11 11:19, A Df wrote: >> See inline >> >> >> >>> >>> From: Steve Loughran >>> To: common-user@hadoop.apache.org >>> Sent: Tuesday, 16 August 2011, 11:08 >>> Subject: Re: hadoop cluster mode not starting up >>> >>> On 16/08/11 11:02, A Df wrote: Hello All: I used a combination of tutorials to setup hadoop but most seems to be using either an old version of hadoop or only using 2 machines for the cluster which isn't really a cluster. Does anyone know of a good tutorial which setups multiple nodes for a cluster?? I already looked at the Apache website but it does not give sample values for the conf files. Also each set of tutorials seem to have a different set of parameters which they indicate should be changed so now its a bit confusing. For example, my configuration sets a dedicate namenode, secondary namenode and 8 slave nodes but when I run the start command it gives an error. Should I install hadoop to my user directory or on the root? I have it in my directory but all the nodes have a central file system as opposed to distributed so whatever I do on one node in my user folder it affect all the others so how do i set the paths to ensure that it uses a distributed system? For the errors below, I checked the directories and the files are there. Am I not sure what went wrong and how to set the conf to not have central file system. Thank you. Error message CODE w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh bin/start-dfs.sh: line 28: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or directory bin/start-dfs.sh: line 50: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or directory bin/start-dfs.sh: line 51: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory bin/start-dfs.sh: line 52: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory CODE >>> >>> there's No such file or directory as >>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh >>> >>> >>> There is, I checked as shown >>> w1153435@n51:~/hadoop-0.20.2_cluster> ls bin >>> hadooprccstart-dfs.sh stop-dfs.sh >>> hadoop-config.sh slaves.sh start-mapred.sh stop-mapred.sh >>> hadoop-daemon.sh start-all.sh stop-all.sh >>> hadoop-daemons.sh start-balancer.sh stop-balancer.sh > >try "pwd" to print out where the OS thinks you are, as it doesn't seem >to be where you think you are > > >w1153435@ngs:~/hadoop-0.20.2_cluster> pwd >/home/w1153435/hadoop-0.20.2_cluster > > >w1153435@ngs:~/hadoop-0.20.2_cluster/bin> pwd >/home/w1153435/hadoop-0.20.2_cluster/bin > >>> >>> >>> >>> I had tried running this command below earlier but also got problems: CODE w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_CONF_DIR=${HADOOP_HOME}/conf w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" -bash: /bin/slaves.sh: No such file or directory w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" cat: /conf/slaves: No such file or directory CODE >>> there's No such file or directory as /conf/slaves because you set >>> HADOOP_HOME after setting the other env variables, which are expanded at >>> set-time, not run-time. >>> >>> I redid the command but still have errors on the slaves >>> >>> >>> w1153435@n51:~/hadoop-0.20.2_cluster> export
Why hadoop should be built on JAVA?
Why hadoop should be built in JAVA? For integrity and stability, it is good for hadoop to be implemented in Java But, when it comes to speed issue, I have a question... How will it be if HADOOP is implemented in C or Phython?
Re: WritableComparable
Are you using a hash partioner? If so make sure the hash value of the writable is not calculated using the hashCode value of the enum - use the ordinal value instead. The hashcode value of an enum is different for each jvm.
Re: hadoop cluster mode not starting up
I already used a few tutorials as follows: * Hadoop Tutorial on Yahoo Developer network which uses an old hadoop and thus older conf files. * http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ which only has two nodes and the master acts as namenode and secondary namenode. I need one with more than that. Is there a way to prevent the node from using the central file system because I don't have root permission and my user folder is in a central file system which is replicated on all the nodes? See inline too for my responses > >From: Steve Loughran >To: common-user@hadoop.apache.org >Sent: Tuesday, 16 August 2011, 12:08 >Subject: Re: hadoop cluster mode not starting up > >On 16/08/11 11:19, A Df wrote: >> See inline >> >> >> >>> >>> From: Steve Loughran >>> To: common-user@hadoop.apache.org >>> Sent: Tuesday, 16 August 2011, 11:08 >>> Subject: Re: hadoop cluster mode not starting up >>> >>> On 16/08/11 11:02, A Df wrote: Hello All: I used a combination of tutorials to setup hadoop but most seems to be using either an old version of hadoop or only using 2 machines for the cluster which isn't really a cluster. Does anyone know of a good tutorial which setups multiple nodes for a cluster?? I already looked at the Apache website but it does not give sample values for the conf files. Also each set of tutorials seem to have a different set of parameters which they indicate should be changed so now its a bit confusing. For example, my configuration sets a dedicate namenode, secondary namenode and 8 slave nodes but when I run the start command it gives an error. Should I install hadoop to my user directory or on the root? I have it in my directory but all the nodes have a central file system as opposed to distributed so whatever I do on one node in my user folder it affect all the others so how do i set the paths to ensure that it uses a distributed system? For the errors below, I checked the directories and the files are there. Am I not sure what went wrong and how to set the conf to not have central file system. Thank you. Error message CODE w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh bin/start-dfs.sh: line 28: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or directory bin/start-dfs.sh: line 50: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or directory bin/start-dfs.sh: line 51: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory bin/start-dfs.sh: line 52: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory CODE >>> >>> there's No such file or directory as >>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh >>> >>> >>> There is, I checked as shown >>> w1153435@n51:~/hadoop-0.20.2_cluster> ls bin >>> hadoop rcc start-dfs.sh stop-dfs.sh >>> hadoop-config.sh slaves.sh start-mapred.sh stop-mapred.sh >>> hadoop-daemon.sh start-all.sh stop-all.sh >>> hadoop-daemons.sh start-balancer.sh stop-balancer.sh > >try "pwd" to print out where the OS thinks you are, as it doesn't seem >to be where you think you are > > >w1153435@ngs:~/hadoop-0.20.2_cluster> pwd >/home/w1153435/hadoop-0.20.2_cluster > > >w1153435@ngs:~/hadoop-0.20.2_cluster/bin> pwd >/home/w1153435/hadoop-0.20.2_cluster/bin > >>> >>> >>> >>> I had tried running this command below earlier but also got problems: CODE w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_CONF_DIR=${HADOOP_HOME}/conf w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" -bash: /bin/slaves.sh: No such file or directory w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" cat: /conf/slaves: No such file or directory CODE >>> there's No such file or directory as /conf/slaves because you set >>> HADOOP_HOME after setting the other env variables, which are expanded at >>> set-time, not run-time. >>> >>> I redid the command but still have errors on the slaves >>> >>> >>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>> HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster >>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>> HADOOP_CONF_DIR=${HADOOP_HOME}/conf >>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>> HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves >>> w1153435@n51:~
Re: hadoop cluster mode not starting up
On 16/08/11 11:19, A Df wrote: See inline From: Steve Loughran To: common-user@hadoop.apache.org Sent: Tuesday, 16 August 2011, 11:08 Subject: Re: hadoop cluster mode not starting up On 16/08/11 11:02, A Df wrote: Hello All: I used a combination of tutorials to setup hadoop but most seems to be using either an old version of hadoop or only using 2 machines for the cluster which isn't really a cluster. Does anyone know of a good tutorial which setups multiple nodes for a cluster?? I already looked at the Apache website but it does not give sample values for the conf files. Also each set of tutorials seem to have a different set of parameters which they indicate should be changed so now its a bit confusing. For example, my configuration sets a dedicate namenode, secondary namenode and 8 slave nodes but when I run the start command it gives an error. Should I install hadoop to my user directory or on the root? I have it in my directory but all the nodes have a central file system as opposed to distributed so whatever I do on one node in my user folder it affect all the others so how do i set the paths to ensure that it uses a distributed system? For the errors below, I checked the directories and the files are there. Am I not sure what went wrong and how to set the conf to not have central file system. Thank you. Error message CODE w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh bin/start-dfs.sh: line 28: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or directory bin/start-dfs.sh: line 50: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or directory bin/start-dfs.sh: line 51: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory bin/start-dfs.sh: line 52: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory CODE there's No such file or directory as /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh There is, I checked as shown w1153435@n51:~/hadoop-0.20.2_cluster> ls bin hadoop rccstart-dfs.sh stop-dfs.sh hadoop-config.sh slaves.sh start-mapred.sh stop-mapred.sh hadoop-daemon.sh start-all.sh stop-all.sh hadoop-daemons.sh start-balancer.sh stop-balancer.sh try "pwd" to print out where the OS thinks you are, as it doesn't seem to be where you think you are I had tried running this command below earlier but also got problems: CODE w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_CONF_DIR=${HADOOP_HOME}/conf w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" -bash: /bin/slaves.sh: No such file or directory w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" cat: /conf/slaves: No such file or directory CODE there's No such file or directory as /conf/slaves because you set HADOOP_HOME after setting the other env variables, which are expanded at set-time, not run-time. I redid the command but still have errors on the slaves w1153435@n51:~/hadoop-0.20.2_cluster> export HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster w1153435@n51:~/hadoop-0.20.2_cluster> export HADOOP_CONF_DIR=${HADOOP_HOME}/conf w1153435@n51:~/hadoop-0.20.2_cluster> export HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves w1153435@n51:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" privn51: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory privn58: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory privn52: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory privn55: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory privn57: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory privn54: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory privn53: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory privn56: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory try ssh-ing in, do it by hand, make sure you have the right permissions etc
Re: hadoop cluster mode not starting up
Hi Df, Can you get : echo $HADOOP_HOME On Tue, Aug 16, 2011 at 3:49 PM, A Df wrote: > See inline > > > > > > >From: Steve Loughran > >To: common-user@hadoop.apache.org > >Sent: Tuesday, 16 August 2011, 11:08 > >Subject: Re: hadoop cluster mode not starting up > > > >On 16/08/11 11:02, A Df wrote: > >> Hello All: > >> > >> I used a combination of tutorials to setup hadoop but most seems to be > using either an old version of hadoop or only using 2 machines for the > cluster which isn't really a cluster. Does anyone know of a good tutorial > which setups multiple nodes for a cluster?? I already looked at the Apache > website but it does not give sample values for the conf files. Also each set > of tutorials seem to have a different set of parameters which they indicate > should be changed so now its a bit confusing. For example, my configuration > sets a dedicate namenode, secondary namenode and 8 slave nodes but when I > run the start command it gives an error. Should I install hadoop to my user > directory or on the root? I have it in my directory but all the nodes have a > central file system as opposed to distributed so whatever I do on one node > in my user folder it affect all the others so how do i set the paths to > ensure that it uses a distributed system? > >> > >> For the errors below, I checked the directories and the files are there. > Am I not sure what went wrong and how to set the conf to not have central > file system. Thank you. > >> > >> Error message > >> CODE > >> w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh > >> bin/start-dfs.sh: line 28: > /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or > directory > >> bin/start-dfs.sh: line 50: > /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or > directory > >> bin/start-dfs.sh: line 51: > /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or > directory > >> bin/start-dfs.sh: line 52: > /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or > directory > >> CODE > > > >there's No such file or directory as > >/w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh > > > > > >There is, I checked as shown > >w1153435@n51:~/hadoop-0.20.2_cluster> ls bin > >hadoop rccstart-dfs.sh stop-dfs.sh > >hadoop-config.sh slaves.sh start-mapred.sh stop-mapred.sh > >hadoop-daemon.sh start-all.sh stop-all.sh > >hadoop-daemons.sh start-balancer.sh stop-balancer.sh > > > > > > > > > >> > >> I had tried running this command below earlier but also got problems: > >> CODE > >> w1153435@ngs:~/hadoop-0.20.2_cluster> export > HADOOP_CONF_DIR=${HADOOP_HOME}/conf > >> w1153435@ngs:~/hadoop-0.20.2_cluster> export > HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves > >> w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh > "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" > >> -bash: /bin/slaves.sh: No such file or directory > >> w1153435@ngs:~/hadoop-0.20.2_cluster> export > HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster > >> w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh > "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" > >> cat: /conf/slaves: No such file or directory > >> CODE > >> > >there's No such file or directory as /conf/slaves because you set > >HADOOP_HOME after setting the other env variables, which are expanded at > >set-time, not run-time. > > > >I redid the command but still have errors on the slaves > > > > > >w1153435@n51:~/hadoop-0.20.2_cluster> export > HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster > >w1153435@n51:~/hadoop-0.20.2_cluster> export > HADOOP_CONF_DIR=${HADOOP_HOME}/conf > >w1153435@n51:~/hadoop-0.20.2_cluster> export > HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves > >w1153435@n51:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir > -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" > >privn51: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: > No such file or directory > >privn58: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: > No such file or directory > >privn52: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: > No such file or directory > >privn55: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: > No such file or directory > >privn57: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: > No such file or directory > >privn54: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: > No such file or directory > >privn53: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: > No such file or directory > >privn56: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: > No such file or directory > > > > -- Thanks, Shah
Is is possible to use HFTP as input when streaming?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi all, Is it possible to use HFTP input like this: $ hadoop jar hadoop-streaming.jar \ - -input hftp://node1/records \ - -output hdfs://node2/result ... ? I've tried several times with different versions but failed with exceptions thrown as "java.io.IOException: HTTP_PARTIAL expected, received 200" and "java.io.IOException: Server returned HTTP response code: 500 for URL:..." It seems like the official document is also lacking such examples. I am not sure if I did this correctly, so if someone already successfully set things up, would you please point me to some guide? Thanks in advance. -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.12 (Darwin) iEYEAREIAAYFAk5KRGoACgkQTDLEMen/8ayBXgCfeTmdTwNZJjUro7ZETNI9WjTp vUMAnRvWLGCip/2vYTzkhajGmhbu467u =bwD2 -END PGP SIGNATURE-
Re: hadoop cluster mode not starting up
See inline > >From: Steve Loughran >To: common-user@hadoop.apache.org >Sent: Tuesday, 16 August 2011, 11:08 >Subject: Re: hadoop cluster mode not starting up > >On 16/08/11 11:02, A Df wrote: >> Hello All: >> >> I used a combination of tutorials to setup hadoop but most seems to be using >> either an old version of hadoop or only using 2 machines for the cluster >> which isn't really a cluster. Does anyone know of a good tutorial which >> setups multiple nodes for a cluster?? I already looked at the Apache website >> but it does not give sample values for the conf files. Also each set of >> tutorials seem to have a different set of parameters which they indicate >> should be changed so now its a bit confusing. For example, my configuration >> sets a dedicate namenode, secondary namenode and 8 slave nodes but when I >> run the start command it gives an error. Should I install hadoop to my user >> directory or on the root? I have it in my directory but all the nodes have a >> central file system as opposed to distributed so whatever I do on one node >> in my user folder it affect all the others so how do i set the paths to >> ensure that it uses a distributed system? >> >> For the errors below, I checked the directories and the files are there. Am >> I not sure what went wrong and how to set the conf to not have central file >> system. Thank you. >> >> Error message >> CODE >> w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh >> bin/start-dfs.sh: line 28: >> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or >> directory >> bin/start-dfs.sh: line 50: >> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or >> directory >> bin/start-dfs.sh: line 51: >> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or >> directory >> bin/start-dfs.sh: line 52: >> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or >> directory >> CODE > >there's No such file or directory as >/w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh > > >There is, I checked as shown >w1153435@n51:~/hadoop-0.20.2_cluster> ls bin >hadoop rcc start-dfs.sh stop-dfs.sh >hadoop-config.sh slaves.sh start-mapred.sh stop-mapred.sh >hadoop-daemon.sh start-all.sh stop-all.sh >hadoop-daemons.sh start-balancer.sh stop-balancer.sh > > > > >> >> I had tried running this command below earlier but also got problems: >> CODE >> w1153435@ngs:~/hadoop-0.20.2_cluster> export >> HADOOP_CONF_DIR=${HADOOP_HOME}/conf >> w1153435@ngs:~/hadoop-0.20.2_cluster> export >> HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves >> w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir >> -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >> -bash: /bin/slaves.sh: No such file or directory >> w1153435@ngs:~/hadoop-0.20.2_cluster> export >> HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster >> w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir >> -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >> cat: /conf/slaves: No such file or directory >> CODE >> >there's No such file or directory as /conf/slaves because you set >HADOOP_HOME after setting the other env variables, which are expanded at >set-time, not run-time. > >I redid the command but still have errors on the slaves > > >w1153435@n51:~/hadoop-0.20.2_cluster> export >HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster >w1153435@n51:~/hadoop-0.20.2_cluster> export >HADOOP_CONF_DIR=${HADOOP_HOME}/conf >w1153435@n51:~/hadoop-0.20.2_cluster> export >HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves >w1153435@n51:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p >/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >privn51: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No >such file or directory >privn58: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No >such file or directory >privn52: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No >such file or directory >privn55: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No >such file or directory >privn57: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No >such file or directory >privn54: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No >such file or directory >privn53: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No >such file or directory >privn56: bash: mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No >such file or directory > >
Re: hadoop cluster mode not starting up
Hi Df, I think you didn,t set the conf/slave files in hadoop and bin/* (* - files you specified are not present ). Verified these files in bin directory. The following link is very useful to configure the hadoop in multinode. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ Regards, Shanmuganathan On Tue, 16 Aug 2011 15:32:57 +0530 A Dfwrote Hello All: I used a combination of tutorials to setup hadoop but most seems to be using either an old version of hadoop or only using 2 machines for the cluster which isn't really a cluster. Does anyone know of a good tutorial which setups multiple nodes for a cluster?? I already looked at the Apache website but it does not give sample values for the conf files. Also each set of tutorials seem to have a different set of parameters which they indicate should be changed so now its a bit confusing. For example, my configuration sets a dedicate namenode, secondary namenode and 8 slave nodes but when I run the start command it gives an error. Should I install hadoop to my user directory or on the root? I have it in my directory but all the nodes have a central file system as opposed to distributed so whatever I do on one node in my user folder it affect all the others so how do i set the paths to ensure that it uses a distributed system? For the errors below, I checked the directories and the files are there. Am I not sure what went wrong and how to set the conf to not have central file system. Thank you. Error message CODE w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh bin/start-dfs.sh: line 28: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or directory bin/start-dfs.sh: line 50: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or directory bin/start-dfs.sh: line 51: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory bin/start-dfs.sh: line 52: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory CODE I had tried running this command below earlier but also got problems: CODE w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_CONF_DIR=${HADOOP_HOME}/conf w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" -bash: /bin/slaves.sh: No such file or directory w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" cat: /conf/slaves: No such file or directory CODE Cheers, A Df
Re: hadoop cluster mode not starting up
On 16/08/11 11:02, A Df wrote: Hello All: I used a combination of tutorials to setup hadoop but most seems to be using either an old version of hadoop or only using 2 machines for the cluster which isn't really a cluster. Does anyone know of a good tutorial which setups multiple nodes for a cluster?? I already looked at the Apache website but it does not give sample values for the conf files. Also each set of tutorials seem to have a different set of parameters which they indicate should be changed so now its a bit confusing. For example, my configuration sets a dedicate namenode, secondary namenode and 8 slave nodes but when I run the start command it gives an error. Should I install hadoop to my user directory or on the root? I have it in my directory but all the nodes have a central file system as opposed to distributed so whatever I do on one node in my user folder it affect all the others so how do i set the paths to ensure that it uses a distributed system? For the errors below, I checked the directories and the files are there. Am I not sure what went wrong and how to set the conf to not have central file system. Thank you. Error message CODE w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh bin/start-dfs.sh: line 28: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or directory bin/start-dfs.sh: line 50: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or directory bin/start-dfs.sh: line 51: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory bin/start-dfs.sh: line 52: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory CODE there's No such file or directory as /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh I had tried running this command below earlier but also got problems: CODE w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_CONF_DIR=${HADOOP_HOME}/conf w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" -bash: /bin/slaves.sh: No such file or directory w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" cat: /conf/slaves: No such file or directory CODE there's No such file or directory as /conf/slaves because you set HADOOP_HOME after setting the other env variables, which are expanded at set-time, not run-time.
Re: How do I add Hadoop dependency to a Maven project?
On 13/08/11 00:08, W.P. McNeill wrote: I want the latest version of Hadoop (with the new API). I guess that's the trunk version, but I don't see the hadoop-mapreduce artifact listed on https://repository.apache.org/index.html#nexus-search;quick~hadoop I have a set up elsewhere, POM-less http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/antbuild/repository/org.apache.hadoop/ There are some tagged .nolog to say the log4j.properties file has been stripped out
hadoop cluster mode not starting up
Hello All: I used a combination of tutorials to setup hadoop but most seems to be using either an old version of hadoop or only using 2 machines for the cluster which isn't really a cluster. Does anyone know of a good tutorial which setups multiple nodes for a cluster?? I already looked at the Apache website but it does not give sample values for the conf files. Also each set of tutorials seem to have a different set of parameters which they indicate should be changed so now its a bit confusing. For example, my configuration sets a dedicate namenode, secondary namenode and 8 slave nodes but when I run the start command it gives an error. Should I install hadoop to my user directory or on the root? I have it in my directory but all the nodes have a central file system as opposed to distributed so whatever I do on one node in my user folder it affect all the others so how do i set the paths to ensure that it uses a distributed system? For the errors below, I checked the directories and the files are there. Am I not sure what went wrong and how to set the conf to not have central file system. Thank you. Error message CODE w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh bin/start-dfs.sh: line 28: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or directory bin/start-dfs.sh: line 50: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or directory bin/start-dfs.sh: line 51: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory bin/start-dfs.sh: line 52: /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or directory CODE I had tried running this command below earlier but also got problems: CODE w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_CONF_DIR=${HADOOP_HOME}/conf w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" -bash: /bin/slaves.sh: No such file or directory w1153435@ngs:~/hadoop-0.20.2_cluster> export HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster w1153435@ngs:~/hadoop-0.20.2_cluster> ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" cat: /conf/slaves: No such file or directory CODE Cheers, A Df
Re: Hadoop Meetup
Guys, Please send your views for the same. Thanks, Shah On Tue, Aug 16, 2011 at 11:21 AM, Ankit Minocha wrote: > Hi > > I am planning to start a hadoop meetup every alternate Saturday/Sunday. > Idea is to discuss the latest happenings in the world of Big data, and > create more awareness. > > Place: Delhi, India > Suggestions please > > Thanks > Ankit >
Re: Hadoop Meetup
Agreed On 16 August 2011 12:52, Shahnawaz Saifi wrote: > Totally Agree > > On Tue, Aug 16, 2011 at 12:47 PM, Ted Dunning > wrote: > > > Please keep the recruiting to private email and off the list. > > > > On Monday, August 15, 2011, rishikesh > > wrote: > > > Ok thanks, Ankit will u give me some reference for Hadoop developer I > > have > > a > > > critical requirement > > > > > > Thanks & Regards > > > Rishi | Associate Recruiter > > > 080--40950825\9620245003 > > > > > > Aim Plus Staffing Solutions | Bangalore | Branches -New Delhi | Chennai > > > www.aimplusstaffing.com > > > > > > -Original Message- > > > From: Ankit Minocha [mailto:ankitminocha1...@gmail.com] > > > Sent: Tuesday, August 16, 2011 11:58 AM > > > To: common-user@hadoop.apache.org > > > Subject: Re: Hadoop Meetup > > > > > > Rishikesh > > > > > > This is regarding a casual meetup for discussing latest happenings in > > > Hadoop, and not a career opportunity, much like a BARCAMP. > > > I would get in touch with you whenever I am looking for work. > > > > > > Thanks > > > Ankit > > > > > > On 16 August 2011 11:49, rishikesh > > wrote: > > > > > >> Hi Ankit, > > >> > > >> Kindly please send me your updated resume > > >> > > >> Thanks & Regards > > >> Rishi | Associate Recruiter > > >> 080--40950825\9620245003 > > >> > > >> Aim Plus Staffing Solutions | Bangalore | Branches -New Delhi | > Chennai > > >> www.aimplusstaffing.com > > >> > > >> > > >> -Original Message- > > >> From: Ankit Minocha [mailto:ankitminocha1...@gmail.com] > > >> Sent: Tuesday, August 16, 2011 11:21 AM > > >> To: common-user > > >> Subject: Hadoop Meetup > > >> > > >> Hi > > >> > > >> I am planning to start a hadoop meetup every alternate > Saturday/Sunday. > > >> Idea is to discuss the latest happenings in the world of Big data, and > > >> create more awareness. > > >> > > >> Place: Delhi, India > > >> Suggestions please > > >> > > >> Thanks > > >> Ankit > > >> > > >> > > > > > > > > > > > > -- > Thanks, > Shah >
Re: Hadoop Meetup
Totally Agree On Tue, Aug 16, 2011 at 12:47 PM, Ted Dunning wrote: > Please keep the recruiting to private email and off the list. > > On Monday, August 15, 2011, rishikesh > wrote: > > Ok thanks, Ankit will u give me some reference for Hadoop developer I > have > a > > critical requirement > > > > Thanks & Regards > > Rishi | Associate Recruiter > > 080--40950825\9620245003 > > > > Aim Plus Staffing Solutions | Bangalore | Branches -New Delhi | Chennai > > www.aimplusstaffing.com > > > > -Original Message- > > From: Ankit Minocha [mailto:ankitminocha1...@gmail.com] > > Sent: Tuesday, August 16, 2011 11:58 AM > > To: common-user@hadoop.apache.org > > Subject: Re: Hadoop Meetup > > > > Rishikesh > > > > This is regarding a casual meetup for discussing latest happenings in > > Hadoop, and not a career opportunity, much like a BARCAMP. > > I would get in touch with you whenever I am looking for work. > > > > Thanks > > Ankit > > > > On 16 August 2011 11:49, rishikesh > wrote: > > > >> Hi Ankit, > >> > >> Kindly please send me your updated resume > >> > >> Thanks & Regards > >> Rishi | Associate Recruiter > >> 080--40950825\9620245003 > >> > >> Aim Plus Staffing Solutions | Bangalore | Branches -New Delhi | Chennai > >> www.aimplusstaffing.com > >> > >> > >> -Original Message- > >> From: Ankit Minocha [mailto:ankitminocha1...@gmail.com] > >> Sent: Tuesday, August 16, 2011 11:21 AM > >> To: common-user > >> Subject: Hadoop Meetup > >> > >> Hi > >> > >> I am planning to start a hadoop meetup every alternate Saturday/Sunday. > >> Idea is to discuss the latest happenings in the world of Big data, and > >> create more awareness. > >> > >> Place: Delhi, India > >> Suggestions please > >> > >> Thanks > >> Ankit > >> > >> > > > > > -- Thanks, Shah
Re: Hadoop Meetup
Please keep the recruiting to private email and off the list. On Monday, August 15, 2011, rishikesh wrote: > Ok thanks, Ankit will u give me some reference for Hadoop developer I have a > critical requirement > > Thanks & Regards > Rishi | Associate Recruiter > 080--40950825\9620245003 > > Aim Plus Staffing Solutions | Bangalore | Branches -New Delhi | Chennai > www.aimplusstaffing.com > > -Original Message- > From: Ankit Minocha [mailto:ankitminocha1...@gmail.com] > Sent: Tuesday, August 16, 2011 11:58 AM > To: common-user@hadoop.apache.org > Subject: Re: Hadoop Meetup > > Rishikesh > > This is regarding a casual meetup for discussing latest happenings in > Hadoop, and not a career opportunity, much like a BARCAMP. > I would get in touch with you whenever I am looking for work. > > Thanks > Ankit > > On 16 August 2011 11:49, rishikesh wrote: > >> Hi Ankit, >> >> Kindly please send me your updated resume >> >> Thanks & Regards >> Rishi | Associate Recruiter >> 080--40950825\9620245003 >> >> Aim Plus Staffing Solutions | Bangalore | Branches -New Delhi | Chennai >> www.aimplusstaffing.com >> >> >> -Original Message- >> From: Ankit Minocha [mailto:ankitminocha1...@gmail.com] >> Sent: Tuesday, August 16, 2011 11:21 AM >> To: common-user >> Subject: Hadoop Meetup >> >> Hi >> >> I am planning to start a hadoop meetup every alternate Saturday/Sunday. >> Idea is to discuss the latest happenings in the world of Big data, and >> create more awareness. >> >> Place: Delhi, India >> Suggestions please >> >> Thanks >> Ankit >> >> > >
Re: Hadoop Meetup
nice to see the initiative, please plan I am in for it. -- Thanks, Shah On Tue, Aug 16, 2011 at 11:21 AM, Ankit Minocha wrote: > Hi > > I am planning to start a hadoop meetup every alternate Saturday/Sunday. > Idea is to discuss the latest happenings in the world of Big data, and > create more awareness. > > Place: Delhi, India > Suggestions please > > Thanks > Ankit >