openjdk warning, the vm will try to fix the stack guard
hi, i have have hadoop v2.3.0 installed on CentOS 6.5 64-bit. OpenJDK 64-bit v1.7 is my java version. when i attempt to start hadoop, i keep seeing this message below. OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. i followed the instruction. here were my steps. 1. sudo yum install -y prelink 2. execstack -c /usr/local/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 however, this message still keeps popping up. i did some more search on the internet, and one user says that basically, libhadoop.so.1.0.0 is 32-bit, and to get rid of this message, i will need to recompile this into 64-bit. is that correct? is there not a 64-bit version of libhadoop.so.1.0.0 available for download? thanks,
are the job and task tracker monitor webpages gone now in hadoop v2.3.0
i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big leap). i was wondering if there is a way to view my jobs now via a web UI? i used to be able to do this by accessing the following URL http://hadoop-cluster:50030/jobtracker.jsp however, there is no more job tracker monitoring page here. furthermore, i am confused about MapReduce as an application running on top of YARN. so the documentation says MapReduce is just an application running on YARN. if that is true, how come i do not see MapReduce as an application on the ResourceManager web UI? http://hadoop-cluster:8088/cluster/apps is this because MapReduce is NOT a long-running app? meaning, a MapReduce job will only show up as an app in YARN when it is running? (please bear with me, i'm still adjusting to this new design). any help/pointer is appreciated.
Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0
Yes. JobTracker and TaskTracker are gone from all the 2.x release lines. MapReduce is an application on top of YARN. That is per job - launches, starts and finishes after it is done with its work. Once it is done, you can go look at it in the MapReduce specific JobHistoryServer. +Vinod On Mar 6, 2014, at 1:11 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big leap). i was wondering if there is a way to view my jobs now via a web UI? i used to be able to do this by accessing the following URL http://hadoop-cluster:50030/jobtracker.jsp however, there is no more job tracker monitoring page here. furthermore, i am confused about MapReduce as an application running on top of YARN. so the documentation says MapReduce is just an application running on YARN. if that is true, how come i do not see MapReduce as an application on the ResourceManager web UI? http://hadoop-cluster:8088/cluster/apps is this because MapReduce is NOT a long-running app? meaning, a MapReduce job will only show up as an app in YARN when it is running? (please bear with me, i'm still adjusting to this new design). any help/pointer is appreciated. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0
when i go to the job history server http://hadoop-cluster:19888/jobhistory i see no map reduce job there. i ran 3 simple mr jobs successfully. i verified by the console output and hdfs output directory. all i see on the UI is: No data available in table. any ideas? unless there is a JobHistoryServer just for MapReduce.. where is that? On Thu, Mar 6, 2014 at 7:14 PM, Vinod Kumar Vavilapalli vino...@apache.orgwrote: Yes. JobTracker and TaskTracker are gone from all the 2.x release lines. MapReduce is an application on top of YARN. That is per job - launches, starts and finishes after it is done with its work. Once it is done, you can go look at it in the MapReduce specific JobHistoryServer. +Vinod On Mar 6, 2014, at 1:11 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big leap). i was wondering if there is a way to view my jobs now via a web UI? i used to be able to do this by accessing the following URL http://hadoop-cluster:50030/jobtracker.jsp however, there is no more job tracker monitoring page here. furthermore, i am confused about MapReduce as an application running on top of YARN. so the documentation says MapReduce is just an application running on YARN. if that is true, how come i do not see MapReduce as an application on the ResourceManager web UI? http://hadoop-cluster:8088/cluster/apps is this because MapReduce is NOT a long-running app? meaning, a MapReduce job will only show up as an app in YARN when it is running? (please bear with me, i'm still adjusting to this new design). any help/pointer is appreciated. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0
ok, the reason why hadoop jobs were not showing up was because i did not enable mapreduce to be run as a yarn application. On Thu, Mar 6, 2014 at 11:45 PM, Jane Wayne jane.wayne2...@gmail.comwrote: when i go to the job history server http://hadoop-cluster:19888/jobhistory i see no map reduce job there. i ran 3 simple mr jobs successfully. i verified by the console output and hdfs output directory. all i see on the UI is: No data available in table. any ideas? unless there is a JobHistoryServer just for MapReduce.. where is that? On Thu, Mar 6, 2014 at 7:14 PM, Vinod Kumar Vavilapalli vino...@apache.org wrote: Yes. JobTracker and TaskTracker are gone from all the 2.x release lines. MapReduce is an application on top of YARN. That is per job - launches, starts and finishes after it is done with its work. Once it is done, you can go look at it in the MapReduce specific JobHistoryServer. +Vinod On Mar 6, 2014, at 1:11 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big leap). i was wondering if there is a way to view my jobs now via a web UI? i used to be able to do this by accessing the following URL http://hadoop-cluster:50030/jobtracker.jsp however, there is no more job tracker monitoring page here. furthermore, i am confused about MapReduce as an application running on top of YARN. so the documentation says MapReduce is just an application running on YARN. if that is true, how come i do not see MapReduce as an application on the ResourceManager web UI? http://hadoop-cluster:8088/cluster/apps is this because MapReduce is NOT a long-running app? meaning, a MapReduce job will only show up as an app in YARN when it is running? (please bear with me, i'm still adjusting to this new design). any help/pointer is appreciated. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
hdfs permission is still being checked after being disabled
i am using hadoop v2.3.0. in my hdfs-site.xml, i have the following property set. property namedfs.permissions.enabled/name valuefalse/value /property however, when i try to run a hadoop job, i see the following AccessControlException. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hadoopuser, access=EXECUTE, inode=/tmp:root:supergroup:drwxrwx--- to me, it seems that i have already disabled permission checking, so i shouldn't get that AccessControlException. any ideas?
MapReduce: How to output multiplt Avro files?
our input is a line of text which may be parsed to e.g. A or B object. We want all A objects written to A.avro files, while all B objects written to B.avro. I looked into AvroMultipleOutputs class: http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html There is an example, however, it's not quite clear. For job submission, it uses AvroMultipleOutputs.addNamedOutput to add schemas for A and B. In my program looks like: AvroMultipleOutputs.addNamedOutput(job, A, AvroKeyOutputFormat.class, aSchema, null); AvroMultipleOutputs.addNamedOutput(job, B, AvroKeyOutputFormat.class, bSchema, null); I believe this is for Reducer output files. *My question is* what the Mapper output should be, in specific what job.setMapOutputValueClass should be, since the Mapper output could be A or B object, with schema aSchema or bSchema. In my progam, I simply set it to GenericData, but get error as below: 14/03/06 15:55:34 INFO mapreduce.Job: Task Id : attempt_1393817780522_0012_m_10_2, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) I have no idea what this means.
Re: Impact of Tez/Spark to MapReduce
I think it is necessary to look at the question from multiple angles: First there is MapReduce as computing paradigm. Second there is the MapReduce API. And third you have an implementation. My believe is that the computing paradigm is not going away anytime soon. It's a fundamental approach for distributed computing. Not the only one though. The API should also be quite stable so our applications will continue to work. I think it's also a save bet that there will be more high level apis making developers more productive but will call MapReduce internally. And then there is the implementation which will indeed call/use Tez to execute the map and reduce tasks rather sooner than later. This is transparent for the developer just his app just execute faster. Functional programming will certainly play an important role but I doubt it will be the only style e.g. Scala is big but it has not eliminated Java, JavaEE or Spring over the last 10 years. And that's great isn't it? Java/JVM has always been about developer freedom: Platform, language, APIs, frameworks, implementations, You pick what makes you most productive. Just my few cents... Emil Am Mar 6, 2014 um 2:57 AM schrieb Anthony Mattas anth...@mattas.net: Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years. Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term. Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks). On Mar 5, 2014, at 8:48 PM, Jeff Zhang jezh...@gopivotal.com wrote: I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api. And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo edlinuxg...@gmail.com wrote: The thing about yarn is you chose what is right for the the workload. For example: Spark may not the right choice if for example join tables do not fit in memory. On Wednesday, March 5, 2014, Anthony Mattas anth...@mattas.net wrote: With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN? I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN. Anthony Mattas anth...@mattas.net -- Sorry this was sent from mobile. Will do less grammar and spell check than usual. Emil Andreas Siemes Sr. Solution Engineer Hortonworks Inc. esie...@hortonworks.com +49 176 72590764 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: MapReduce: How to output multiplt Avro files?
add avro user mail-list 2014-03-06 16:09 GMT+08:00 Fengyun RAO raofeng...@gmail.com: our input is a line of text which may be parsed to e.g. A or B object. We want all A objects written to A.avro files, while all B objects written to B.avro. I looked into AvroMultipleOutputs class: http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html There is an example, however, it's not quite clear. For job submission, it uses AvroMultipleOutputs.addNamedOutput to add schemas for A and B. In my program looks like: AvroMultipleOutputs.addNamedOutput(job, A, AvroKeyOutputFormat.class, aSchema, null); AvroMultipleOutputs.addNamedOutput(job, B, AvroKeyOutputFormat.class, bSchema, null); I believe this is for Reducer output files. *My question is* what the Mapper output should be, in specific what job.setMapOutputValueClass should be, since the Mapper output could be A or B object, with schema aSchema or bSchema. In my progam, I simply set it to GenericData, but get error as below: 14/03/06 15:55:34 INFO mapreduce.Job: Task Id : attempt_1393817780522_0012_m_10_2, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) I have no idea what this means.
Re:
Maybe your console and browser are using different settings, would you please try wget http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom ? Regards, *Stanley Shi,* On Wed, Mar 5, 2014 at 6:59 PM, Avinash Kujur avin...@gmail.com wrote: yes ming. On Wed, Mar 5, 2014 at 2:56 AM, Mingjiang Shi m...@gopivotal.com wrote: Can you access this link? http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom On Wed, Mar 5, 2014 at 6:54 PM, Avinash Kujur avin...@gmail.com wrote: if i follow repo.maven.apache.org link on my url, it is showing this message : Browsing for this directory has been disabled. View http://search.maven.org/#browse this directory's contents on http://search.maven.org http://search.maven.org/#browse instead. so how can i change the link from repo.maven.apache .org to http://search.maven.org ? On Wed, Mar 5, 2014 at 2:49 AM, Avinash Kujur avin...@gmail.com wrote: yes. it has internet access. On Wed, Mar 5, 2014 at 2:47 AM, Mingjiang Shi m...@gopivotal.comwrote: see the error message: Unknown host repo.maven.apache.org - [Help 2] Does your machine has internet access? On Wed, Mar 5, 2014 at 6:42 PM, Avinash Kujur avin...@gmail.comwrote: home/cloudera/ contains hadoop files. On Wed, Mar 5, 2014 at 2:40 AM, Avinash Kujur avin...@gmail.comwrote: [cloudera@localhost hadoop-common-trunk]$ mvn clean install -DskipTests -Pdist [INFO] Scanning for projects... Downloading: http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom [ERROR] The build could not read 1 project - [Help 1] [ERROR] [ERROR] The project org.apache.hadoop:hadoop-main:3.0.0-SNAPSHOT (/home/cloudera/hadoop-common-trunk/pom.xml) has 1 error [ERROR] Unresolveable build extension: Plugin org.apache.felix:maven-bundle-plugin:2.4.0 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.felix:maven-bundle-plugin:jar:2.4.0: Could not transfer artifact org.apache.felix:maven-bundle-plugin:pom:2.4.0 from/to central ( http://repo.maven.apache.org/maven2): repo.maven.apache.org: Unknown host repo.maven.apache.org - [Help 2] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException when i execute this command from hadoop directory this is the error i am getting. On Wed, Mar 5, 2014 at 2:33 AM, Mingjiang Shi m...@gopivotal.comwrote: Did you execute the command from /home/cloudera? Does it contains the hadoop source code? You need to execute the command from the source code directory. On Wed, Mar 5, 2014 at 6:28 PM, Avinash Kujur avin...@gmail.comwrote: when i am using this command mvn clean install -DskipTests -Pdist its giving this error: [cloudera@localhost ~]$ mvn clean install -DskipTests -Pdist [INFO] Scanning for projects... [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 0.170 s [INFO] Finished at: 2014-03-05T02:25:52-08:00 [INFO] Final Memory: 2M/43M [INFO] [WARNING] The requested profile dist could not be activated because it does not exist. [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/home/cloudera). Please verify you invoked Maven from the correct directory. - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MissingProjectException help me out. Thanks in advance. :) -- Cheers -MJ -- Cheers -MJ -- Cheers -MJ
[no subject]
while impoting jar files using.. mvn clean install -DskipTests -Pdist i am getting this error, [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/home/cloudera). Please verify you invoked Maven from the correct directory. - [Help 1] help me out
Re:
please start writing subject lines for your emails also look at the error message [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/home/cloudera) do ls -l pom.xml inside /home/cloudera directory change directory to where your codebase is and then run the command again after making sure there is pom.xml present in that directory On Thu, Mar 6, 2014 at 3:17 PM, Avinash Kujur avin...@gmail.com wrote: while impoting jar files using.. mvn clean install -DskipTests -Pdist i am getting this error, [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/home/cloudera). Please verify you invoked Maven from the correct directory. - [Help 1] help me out -- Nitin Pawar
MR2 Job over LZO data
Running on Hadoop 2.2.0 The Java MR2 job works as expected on an uncompressed data source using the TextInputFormat.class. But when using the LZO format the job fails: import com.hadoop.mapreduce.LzoTextInputFormat; job.setInputFormatClass(LzoTextInputFormat.class); Dependencies from the maven repository: http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/ Also tried with elephant-bird-core 4.4 The same data can be queried fine from within Hive(0.12) on the same cluster. The exception: Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340) at com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at com.cloudreach.DataQuality.Main.main(Main.java:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) I believe the issue is related to the changes in Hadoop 2, but where can I find a H2 compatible version? Thanks
Warning in secondary namenode log
Hi, I am setting up 2 node hadoop cluster ( 1.2.1) After formatting the FS and starting namenode,datanode and secondarynamenode , i am getting below warning in SecondaryNameNode logs. *WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint Period :3600 secs (60 min)* Please help to debug this. -- Thanks and Regards, Vimal Jain
Re: Warning in secondary namenode log
you can ignore this on 2 node cluster. This value means time it waits between two periodic checkpoints on secondary namenode. On Thu, Mar 6, 2014 at 4:10 PM, Vimal Jain vkj...@gmail.com wrote: Hi, I am setting up 2 node hadoop cluster ( 1.2.1) After formatting the FS and starting namenode,datanode and secondarynamenode , i am getting below warning in SecondaryNameNode logs. *WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint Period :3600 secs (60 min)* Please help to debug this. -- Thanks and Regards, Vimal Jain -- Nitin Pawar
Assertion error while builing hdoop 2.3.0
Hi I have downloaded hadoop-2.3.0-src and followed the guide from http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html The first command mvn clean install -DskipTests was successful. However wen I run cd hadoop-mapreduce-project mvn clean install assembly:assembly -Pnative At some point I get an error. Searching the error message on the web shows some QA on the mailing list archive but seems that they are related to developers. Please see the full messages below (sorry for long post). Running org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.59 sec - in org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup Running org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.003 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler testFailure(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler) Time elapsed: 2.978 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testFailure(TestCommitterEventHandler.java:314) testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler) Time elapsed: 0.255 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263) Running org.apache.hadoop.mapreduce.v2.app.TestRecovery Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.458 sec - in org.apache.hadoop.mapreduce.v2.app.TestRecovery Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.306 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Running org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies Running org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.273 sec - in org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics Running org.apache.hadoop.mapreduce.v2.app.TestFetchFailure Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.859 sec - in org.apache.hadoop.mapreduce.v2.app.TestFetchFailure Running org.apache.hadoop.mapreduce.v2.app.TestFail Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.483 sec - in org.apache.hadoop.mapreduce.v2.app.TestFail Running org.apache.hadoop.mapreduce.v2.app.TestMRApp Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.554 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRApp Running org.apache.hadoop.mapreduce.v2.app.TestKill Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.046 sec - in org.apache.hadoop.mapreduce.v2.app.TestKill Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.314 sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.272 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl testKilledDuringKillAbort(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl) Time elapsed: 5.28 sec FAILURE! java.lang.AssertionError: expected:SETUP but was:RUNNING at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:816) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testKilledDuringKillAbort(TestJobImpl.java:499) Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttemptContainerRequest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.051 sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttemptContainerRequest Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestShuffleProvider Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.112 sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestShuffleProvider Running
Fetching configuration values from cluster
How would I go about fetching configuration values (e.g. yarn-site.xml) from the cluster via the API from an application not running on a cluster node? Thanks John
HDFS java client vs the Command Line
All, I'm running the 2.3.0 distribution as a single node on OSX 10.7. I want to create a directory. From the command line it works; from java it doesn't. I have Googled and read bits and pieces that this is an issue with the OSX feature of case insensitivity with its file system. Can anyone confirm this? If so, can anyone advise as to a workaround? Such a simple thing to get hung up on, go figure. Thanks -- There are ways and there are ways, Geoffry Roberts
Re: Fw: Hadoop at ApacheCon Denver
Wow. . . blast from the past ;)!! How the hell are you? Cheers Oleg On Wed, Mar 5, 2014 at 10:18 AM, Melissa Warnkin missywarn...@yahoo.comwrote: Hello Hadoop enthusiasts, As you are no doubt aware, ApacheCon North America will be held in Denver, Colorado starting on April 7th. Hadoop has 25 talks and two tutorials!! Check it out here: http://apacheconnorthamerica2014.sched.org/?s=hadoop. We would love to see you in Denver next month. Register soon, as prices go up on March 14th. http://na.apachecon.com/. http://na.apachecon.com/ Best regards, Melissa ApacheCon Planning Team
Partitions in Hive
Hi, I have a table with 3 columns in hive. I want that table to be partitioned based on first letter of column 1. How do we define such partition condition in hive ? Regards, Nagarjuna K
Re: HDFS java client vs the Command Line
I've never faced an issue trying to run hadoop and related programs on my OSX. What is your error exactly? Have you ensured your Java classpath carries the configuration directory on it as well, if you aren't running the program via hadoop jar ... but via java -cp ... instead. On Thu, Mar 6, 2014 at 9:50 AM, Geoffry Roberts threadedb...@gmail.com wrote: All, I'm running the 2.3.0 distribution as a single node on OSX 10.7. I want to create a directory. From the command line it works; from java it doesn't. I have Googled and read bits and pieces that this is an issue with the OSX feature of case insensitivity with its file system. Can anyone confirm this? If so, can anyone advise as to a workaround? Such a simple thing to get hung up on, go figure. Thanks -- There are ways and there are ways, Geoffry Roberts -- Harsh J
Re: Partitions in Hive
partition in hive is done on the column value and not on the sub portion of column value. If you want to separate data based on the first character then create another column to store that value On Thu, Mar 6, 2014 at 11:42 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: Hi, I have a table with 3 columns in hive. I want that table to be partitioned based on first letter of column 1. How do we define such partition condition in hive ? Regards, Nagarjuna K -- Nitin Pawar
Re: Assertion error while builing hdoop 2.3.0
Stuck at this step. Hope to receive any idea... Regards, Mahmood On Thursday, March 6, 2014 6:48 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi I have downloaded hadoop-2.3.0-src and followed the guide from http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html The first command mvn clean install -DskipTests was successful. However wen I run cd hadoop-mapreduce-project mvn clean install assembly:assembly -Pnative At some point I get an error. Searching the error message on the web shows some QA on the mailing list archive but seems that they are related to developers. Please see the full messages below (sorry for long post). Running org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.59 sec - in org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup Running org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.003 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler testFailure(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler) Time elapsed: 2.978 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testFailure(TestCommitterEventHandler.java:314) testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler) Time elapsed: 0.255 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263) Running org.apache.hadoop.mapreduce.v2.app.TestRecovery Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.458 sec - in org.apache.hadoop.mapreduce.v2.app.TestRecovery Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.306 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Running org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies Running org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.273 sec - in org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics Running org.apache.hadoop.mapreduce.v2.app.TestFetchFailure Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.859 sec - in org.apache.hadoop.mapreduce.v2.app.TestFetchFailure Running org.apache.hadoop.mapreduce.v2.app.TestFail Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.483 sec - in org.apache.hadoop.mapreduce.v2.app.TestFail Running org.apache.hadoop.mapreduce.v2.app.TestMRApp Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.554 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRApp Running org.apache.hadoop.mapreduce.v2.app.TestKill Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.046 sec - in org.apache.hadoop.mapreduce.v2.app.TestKill Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.314 sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.272 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl testKilledDuringKillAbort(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl) Time elapsed: 5.28 sec FAILURE! java.lang.AssertionError: expected:SETUP but was:RUNNING at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:816) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testKilledDuringKillAbort(TestJobImpl.java:499) Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttemptContainerRequest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.051 sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttemptContainerRequest Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestShuffleProvider Tests run: 1, Failures: 0, Errors: 0,
Running a Job in a Local Job Runner:Windows 7 64-bit
Hi All, I'm trying to get some hands-on on the Map Reduce programming. I downloaded the code examples from Hadoop-The definitive guide, 3rd edition and build it using Maven: mvn package -DskipTests -Dhadoop.distro=apache-2 Next I imported the maven projects into Eclipse. Using Eclipse now I can develop my own Map Reduce jobs. Bu how do I test\run the job locally using the Local Job Runner? The book excerpt says: Now we can run this application against some local files. Hadoop comes with a local job runner, a cut-down version of the MapReduce execution engine for running Map- Reduce jobs in a single JVM. It’s designed for testing, and is very convenient for use in an IDE, since you can run it in a debugger to step through the code in your mapper and reducer. Do I also need to install Hadoop locally on Windows for that? Thanks, -RR
Re: HDFS java client vs the Command Line
Thanks for the response. I figured out what was wrong. I was doing this: Configuration conf = new Configuration(); conf.addResource(new Path(F.CFG_PATH + /core-site.xml)); conf.addResource(new Path(F.CFG_PATH + /hdfs-site.xml)); conf.addResource(new Path(F.CFG_PATH + /mapred-site.xml)); F.CFG_PATH was close but not correct. Fixed it and all is well. Thanks On Thu, Mar 6, 2014 at 1:05 PM, Harsh J ha...@cloudera.com wrote: I've never faced an issue trying to run hadoop and related programs on my OSX. What is your error exactly? Have you ensured your Java classpath carries the configuration directory on it as well, if you aren't running the program via hadoop jar ... but via java -cp ... instead. On Thu, Mar 6, 2014 at 9:50 AM, Geoffry Roberts threadedb...@gmail.com wrote: All, I'm running the 2.3.0 distribution as a single node on OSX 10.7. I want to create a directory. From the command line it works; from java it doesn't. I have Googled and read bits and pieces that this is an issue with the OSX feature of case insensitivity with its file system. Can anyone confirm this? If so, can anyone advise as to a workaround? Such a simple thing to get hung up on, go figure. Thanks -- There are ways and there are ways, Geoffry Roberts -- Harsh J -- There are ways and there are ways, Geoffry Roberts
Re: HDFS java client vs the Command Line
You could avoid all that code by simply placing the configuration directory on the classpath - it will auto-load necessary properties. On Thu, Mar 6, 2014 at 11:36 AM, Geoffry Roberts threadedb...@gmail.com wrote: Thanks for the response. I figured out what was wrong. I was doing this: Configuration conf = new Configuration(); conf.addResource(new Path(F.CFG_PATH + /core-site.xml)); conf.addResource(new Path(F.CFG_PATH + /hdfs-site.xml)); conf.addResource(new Path(F.CFG_PATH + /mapred-site.xml)); F.CFG_PATH was close but not correct. Fixed it and all is well. Thanks On Thu, Mar 6, 2014 at 1:05 PM, Harsh J ha...@cloudera.com wrote: I've never faced an issue trying to run hadoop and related programs on my OSX. What is your error exactly? Have you ensured your Java classpath carries the configuration directory on it as well, if you aren't running the program via hadoop jar ... but via java -cp ... instead. On Thu, Mar 6, 2014 at 9:50 AM, Geoffry Roberts threadedb...@gmail.com wrote: All, I'm running the 2.3.0 distribution as a single node on OSX 10.7. I want to create a directory. From the command line it works; from java it doesn't. I have Googled and read bits and pieces that this is an issue with the OSX feature of case insensitivity with its file system. Can anyone confirm this? If so, can anyone advise as to a workaround? Such a simple thing to get hung up on, go figure. Thanks -- There are ways and there are ways, Geoffry Roberts -- Harsh J -- There are ways and there are ways, Geoffry Roberts -- Harsh J
Re: MapReduce: How to output multiplt Avro files?
If you have a reducer involved, you'll likely need a common map output data type that both A and B can fit into. On Thu, Mar 6, 2014 at 12:09 AM, Fengyun RAO raofeng...@gmail.com wrote: our input is a line of text which may be parsed to e.g. A or B object. We want all A objects written to A.avro files, while all B objects written to B.avro. I looked into AvroMultipleOutputs class: http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html There is an example, however, it's not quite clear. For job submission, it uses AvroMultipleOutputs.addNamedOutput to add schemas for A and B. In my program looks like: AvroMultipleOutputs.addNamedOutput(job, A, AvroKeyOutputFormat.class, aSchema, null); AvroMultipleOutputs.addNamedOutput(job, B, AvroKeyOutputFormat.class, bSchema, null); I believe this is for Reducer output files. My question is what the Mapper output should be, in specific what job.setMapOutputValueClass should be, since the Mapper output could be A or B object, with schema aSchema or bSchema. In my progam, I simply set it to GenericData, but get error as below: 14/03/06 15:55:34 INFO mapreduce.Job: Task Id : attempt_1393817780522_0012_m_10_2, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) I have no idea what this means. -- Harsh J
Re: MapReduce: How to output multiplt Avro files?
thanks, Harsh. any idea on how to build a common map output data type? The only way I can think of is toString(), which would be very inefficient, since A and B are big objects and may change with time, which is also the reason we want to use Avro serialization. 2014-03-07 9:55 GMT+08:00 Harsh J ha...@cloudera.com: If you have a reducer involved, you'll likely need a common map output data type that both A and B can fit into. On Thu, Mar 6, 2014 at 12:09 AM, Fengyun RAO raofeng...@gmail.com wrote: our input is a line of text which may be parsed to e.g. A or B object. We want all A objects written to A.avro files, while all B objects written to B.avro. I looked into AvroMultipleOutputs class: http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html There is an example, however, it's not quite clear. For job submission, it uses AvroMultipleOutputs.addNamedOutput to add schemas for A and B. In my program looks like: AvroMultipleOutputs.addNamedOutput(job, A, AvroKeyOutputFormat.class, aSchema, null); AvroMultipleOutputs.addNamedOutput(job, B, AvroKeyOutputFormat.class, bSchema, null); I believe this is for Reducer output files. My question is what the Mapper output should be, in specific what job.setMapOutputValueClass should be, since the Mapper output could be A or B object, with schema aSchema or bSchema. In my progam, I simply set it to GenericData, but get error as below: 14/03/06 15:55:34 INFO mapreduce.Job: Task Id : attempt_1393817780522_0012_m_10_2, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) I have no idea what this means. -- Harsh J
Re: Assertion error while builing hdoop 2.3.0
Hi Mahmood, I have downloaded hadoop-2.3.0-src and followed the guide from http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html The documentation is still old, and you don't need to compile the source code to build a cluster. I built the latest document and uploaded to github.io. Please follow the following page. http://aajisaka.github.io/hadoop-project/hadoop-project-dist/hadoop-common/SingleCluster.html Thanks, Akira (2014/03/06 11:08), Mahmood Naderan wrote: Stuck at this step. Hope to receive any idea... Regards, Mahmood On Thursday, March 6, 2014 6:48 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi I have downloaded hadoop-2.3.0-src and followed the guide from http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html The first command mvn clean install -DskipTests was successful. However wen I run cd hadoop-mapreduce-project mvn clean install assembly:assembly -Pnative At some point I get an error. Searching the error message on the web shows some QA on the mailing list archive but seems that they are related to developers. Please see the full messages below (sorry for long post). Running org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.59 sec - in org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup Running org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.003 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler testFailure(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler) Time elapsed: 2.978 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testFailure(TestCommitterEventHandler.java:314) testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler) Time elapsed: 0.255 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263) Running org.apache.hadoop.mapreduce.v2.app.TestRecovery Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.458 sec - in org.apache.hadoop.mapreduce.v2.app.TestRecovery Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.306 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Running org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies Running org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.273 sec - in org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics Running org.apache.hadoop.mapreduce.v2.app.TestFetchFailure Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.859 sec - in org.apache.hadoop.mapreduce.v2.app.TestFetchFailure Running org.apache.hadoop.mapreduce.v2.app.TestFail Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.483 sec - in org.apache.hadoop.mapreduce.v2.app.TestFail Running org.apache.hadoop.mapreduce.v2.app.TestMRApp Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.554 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRApp Running org.apache.hadoop.mapreduce.v2.app.TestKill Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.046 sec - in org.apache.hadoop.mapreduce.v2.app.TestKill Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.314 sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.272 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl testKilledDuringKillAbort(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl) Time elapsed: 5.28 sec FAILURE! java.lang.AssertionError: expected:SETUP but was:RUNNING at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at
Re: MR2 Job over LZO data
May be you can try download the LZO class and rebuild it against Hadoop 2.2.0; If build success, you should be good to go; if failed, then maybe you need to wait for the LZO guys to update their code. Regards, *Stanley Shi,* On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote: Running on Hadoop 2.2.0 The Java MR2 job works as expected on an uncompressed data source using the TextInputFormat.class. But when using the LZO format the job fails: import com.hadoop.mapreduce.LzoTextInputFormat; job.setInputFormatClass(LzoTextInputFormat.class); Dependencies from the maven repository: http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/ Also tried with elephant-bird-core 4.4 The same data can be queried fine from within Hive(0.12) on the same cluster. The exception: Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340) at com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at com.cloudreach.DataQuality.Main.main(Main.java:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) I believe the issue is related to the changes in Hadoop 2, but where can I find a H2 compatible version? Thanks
Re: Fetching configuration values from cluster
You can read from http://resource-manager.host.ip:8088/conf This is an xml format file you can use directly. Regards, *Stanley Shi,* On Fri, Mar 7, 2014 at 1:46 AM, John Lilley john.lil...@redpoint.netwrote: How would I go about fetching configuration values (e.g. yarn-site.xml) from the cluster via the API from an application not running on a cluster node? Thanks John
Re: Running a Job in a Local Job Runner:Windows 7 64-bit
Hi RR, You don't need to have the actual Hadoop daemons running on windows machince. Just install Cygwin and ensure that you have all the required Hadoop jars in the class path of your program. You can test/debug directly from the IDE itself just by saying Run As - Java Application on the driver class. This will run the program in the Local Job Runner mode. You can use this to verify the basic logic of your MR code. We you start using advanced features of MR, the behavior/output on Local Job Runner may be different from when running the program on a real distributed cluster. Regards, Rakesh On Thu, Mar 6, 2014 at 2:36 PM, Radhe Radhe radhe.krishna.ra...@live.comwrote: Hi All, I'm trying to get some hands-on on the Map Reduce programming. I downloaded the code examples from Hadoop-The definitive guide, 3rd edition and build it using Maven: *mvn package -DskipTests -Dhadoop.distro=apache-2* Next I imported the maven projects into Eclipse. Using Eclipse now I can develop my own Map Reduce jobs. Bu how do I test\run the job locally using the *Local Job Runner*? The book excerpt says: *Now we can run this application against some local files. Hadoop comes with a localjob runner, a cut-down version of the MapReduce execution engine for running Map-Reduce jobs in a single JVM. It's designed for testing, and is very convenient for use inan IDE, since you can run it in a debugger to step through the code in your mapper andreducer.* Do I also need to install Hadoop locally on Windows for that? Thanks, -RR
RE: MapReduce: How to output multiplt Avro files?
Hi Fengyun, Here's what I've done in the past when facing a similar issue: 1) Set the map output schema to a UNION of both of your target schemas, A and B. 2) Serialize the data in the mappers, using the avro datum as the value. 3) Figure out what the avro schema is for each datum and write out the data in the reducer. Thanks, Alan From: Fengyun RAO [mailto:raofeng...@gmail.com] Sent: Thursday, March 06, 2014 2:14 AM To: user@hadoop.apache.org; u...@avro.apache.org Subject: Re: MapReduce: How to output multiplt Avro files? add avro user mail-list 2014-03-06 16:09 GMT+08:00 Fengyun RAO raofeng...@gmail.com: our input is a line of text which may be parsed to e.g. A or B object. We want all A objects written to A.avro files, while all B objects written to B.avro. I looked into AvroMultipleOutputs class: http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMul tipleOutputs.html There is an example, however, it's not quite clear. For job submission, it uses AvroMultipleOutputs.addNamedOutput to add schemas for A and B. In my program looks like: AvroMultipleOutputs.addNamedOutput(job, A, AvroKeyOutputFormat.class, aSchema, null); AvroMultipleOutputs.addNamedOutput(job, B, AvroKeyOutputFormat.class, bSchema, null); I believe this is for Reducer output files. My question is what the Mapper output should be, in specific what job.setMapOutputValueClass should be, since the Mapper output could be A or B object, with schema aSchema or bSchema. In my progam, I simply set it to GenericData, but get error as below: 14/03/06 15:55:34 INFO mapreduce.Job: Task Id : attempt_1393817780522_0012_m_10_2, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) I have no idea what this means.
Re: Running a Job in a Local Job Runner:Windows 7 64-bit
Running your Driver class from Eclipse should automatically run it in the local runner mode (as thats the default mode). You shouldn't need a local Hadoop install for this. On Thu, Mar 6, 2014 at 11:36 AM, Radhe Radhe radhe.krishna.ra...@live.com wrote: Hi All, I'm trying to get some hands-on on the Map Reduce programming. I downloaded the code examples from Hadoop-The definitive guide, 3rd edition and build it using Maven: mvn package -DskipTests -Dhadoop.distro=apache-2 Next I imported the maven projects into Eclipse. Using Eclipse now I can develop my own Map Reduce jobs. Bu how do I test\run the job locally using the Local Job Runner? The book excerpt says: Now we can run this application against some local files. Hadoop comes with a local job runner, a cut-down version of the MapReduce execution engine for running Map- Reduce jobs in a single JVM. It's designed for testing, and is very convenient for use in an IDE, since you can run it in a debugger to step through the code in your mapper and reducer. Do I also need to install Hadoop locally on Windows for that? Thanks, -RR -- Harsh J
Re: Assertion error while builing hdoop 2.3.0
Thanks for the update. Let me ask a question before continuing the installation. It has been stated To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors. Do you mean the source package or the other? hadoop-2.3.0-src.tar.gz (14MB) hadoop-2.3.0.tar.gz (127MB) Regards, Mahmood On Friday, March 7, 2014 6:04 AM, Akira AJISAKA ajisa...@oss.nttdata.co.jp wrote: Hi Mahmood, I have downloaded hadoop-2.3.0-src and followed the guide from http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html The documentation is still old, and you don't need to compile the source code to build a cluster. I built the latest document and uploaded to github.io. Please follow the following page. http://aajisaka.github.io/hadoop-project/hadoop-project-dist/hadoop-common/SingleCluster.html Thanks, Akira (2014/03/06 11:08), Mahmood Naderan wrote: Stuck at this step. Hope to receive any idea... Regards, Mahmood On Thursday, March 6, 2014 6:48 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi I have downloaded hadoop-2.3.0-src and followed the guide from http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html The first command mvn clean install -DskipTests was successful. However wen I run cd hadoop-mapreduce-project mvn clean install assembly:assembly -Pnative At some point I get an error. Searching the error message on the web shows some QA on the mailing list archive but seems that they are related to developers. Please see the full messages below (sorry for long post). Running org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.59 sec - in org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup Running org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.003 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler testFailure(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler) Time elapsed: 2.978 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testFailure(TestCommitterEventHandler.java:314) testBasic(org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler) Time elapsed: 0.255 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNotNull(Assert.java:526) at org.junit.Assert.assertNotNull(Assert.java:537) at org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler.testBasic(TestCommitterEventHandler.java:263) Running org.apache.hadoop.mapreduce.v2.app.TestRecovery Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.458 sec - in org.apache.hadoop.mapreduce.v2.app.TestRecovery Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.306 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster Running org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies Running org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.273 sec - in org.apache.hadoop.mapreduce.v2.app.metrics.TestMRAppMetrics Running org.apache.hadoop.mapreduce.v2.app.TestFetchFailure Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.859 sec - in org.apache.hadoop.mapreduce.v2.app.TestFetchFailure Running org.apache.hadoop.mapreduce.v2.app.TestFail Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.483 sec - in org.apache.hadoop.mapreduce.v2.app.TestFail Running org.apache.hadoop.mapreduce.v2.app.TestMRApp Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.554 sec - in org.apache.hadoop.mapreduce.v2.app.TestMRApp Running org.apache.hadoop.mapreduce.v2.app.TestKill Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.046 sec - in org.apache.hadoop.mapreduce.v2.app.TestKill Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.314 sec - in org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.272 sec FAILURE! - in
how to import the hadoop code into eclipse.
hi, i have downloaded the hadoop code. And executed maven command successfully. how to import hadoop source code cleanly. because its showing red exclamation mark on some of the modules while i am importing it. help me out. thanks in advance.
Re: how to import the hadoop code into eclipse.
mvn eclipse:eclipse, and then import the existing projects in eclipse. - Zhijie On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur avin...@gmail.com wrote: hi, i have downloaded the hadoop code. And executed maven command successfully. how to import hadoop source code cleanly. because its showing red exclamation mark on some of the modules while i am importing it. help me out. thanks in advance. -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: how to import the hadoop code into eclipse.
i did that. but i have some doubt while importing code. because its showing some warning and error on imported modules. i was wondering if u could give me any proper procedure link. On Thu, Mar 6, 2014 at 9:21 PM, Zhijie Shen zs...@hortonworks.com wrote: mvn eclipse:eclipse, and then import the existing projects in eclipse. - Zhijie On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur avin...@gmail.com wrote: hi, i have downloaded the hadoop code. And executed maven command successfully. how to import hadoop source code cleanly. because its showing red exclamation mark on some of the modules while i am importing it. help me out. thanks in advance. -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?
Hi~ First, i use FileSystem to open a file in hdfs. FSDataInputStream m_dis = fs.open(...); Second, read the data in m_dis to a byte array. byte[] inputdata = new byte[m_dis.available()]; //m_dis.available = 47185920 m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3); the value returned by m_dis.read() is 131072(2^17), so the data after 131072 is missing. It seems that FSDataInputStream use short to manage it's data which confused me a lot. The same code run well in hadoop1.2.1. thank you~
Re: Assertion error while builing hdoop 2.3.0
If you just want to install a cluster to play with, download the hadoop-2.3.0.tar.gz (127MB). On Fri, Mar 7, 2014 at 12:32 PM, Mahmood Naderan nt_mahm...@yahoo.comwrote: hadoop-2.3.0.tar.gz (127MB) -- Cheers -MJ
Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?
the semantic of read does not guarantee read as much as possible. you need to call read() many times or use readFully On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng chenghe...@gmail.com wrote: Hi~ First, i use FileSystem to open a file in hdfs. FSDataInputStream m_dis = fs.open(...); Second, read the data in m_dis to a byte array. byte[] inputdata = new byte[m_dis.available()]; //m_dis.available = 47185920 m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3); the value returned by m_dis.read() is 131072(2^17), so the data after 131072 is missing. It seems that FSDataInputStream use short to manage it's data which confused me a lot. The same code run well in hadoop1.2.1. thank you~
Re: MR2 Job over LZO data
You can try to get the source code https://github.com/twitter/hadoop-lzo and then compile it against hadoop 2.2.0. In my memory, as long as rebuild it, lzo should work with hadoop 2.2.0 On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote: Running on Hadoop 2.2.0 The Java MR2 job works as expected on an uncompressed data source using the TextInputFormat.class. But when using the LZO format the job fails: import com.hadoop.mapreduce.LzoTextInputFormat; job.setInputFormatClass(LzoTextInputFormat.class); Dependencies from the maven repository: http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/ Also tried with elephant-bird-core 4.4 The same data can be queried fine from within Hive(0.12) on the same cluster. The exception: Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340) at com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at com.cloudreach.DataQuality.Main.main(Main.java:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) I believe the issue is related to the changes in Hadoop 2, but where can I find a H2 compatible version? Thanks -- Regards Gordon Wang
Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?
yep that did the job :) I use readFully instead and it works well~~thank you~ 2014-03-07 13:48 GMT+08:00 Binglin Chang decst...@gmail.com: the semantic of read does not guarantee read as much as possible. you need to call read() many times or use readFully On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng chenghe...@gmail.com wrote: Hi~ First, i use FileSystem to open a file in hdfs. FSDataInputStream m_dis = fs.open(...); Second, read the data in m_dis to a byte array. byte[] inputdata = new byte[m_dis.available()]; //m_dis.available = 47185920 m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3); the value returned by m_dis.read() is 131072(2^17), so the data after 131072 is missing. It seems that FSDataInputStream use short to manage it's data which confused me a lot. The same code run well in hadoop1.2.1. thank you~
Re: how to import the hadoop code into eclipse.
ah, yes, I was experiencing some errors on the imported modules, but I fixed it myself manually. Not sure other people has encounter the same problem. Here's a link: http://wiki.apache.org/hadoop/EclipseEnvironment On Thu, Mar 6, 2014 at 9:30 PM, Avinash Kujur avin...@gmail.com wrote: i did that. but i have some doubt while importing code. because its showing some warning and error on imported modules. i was wondering if u could give me any proper procedure link. On Thu, Mar 6, 2014 at 9:21 PM, Zhijie Shen zs...@hortonworks.com wrote: mvn eclipse:eclipse, and then import the existing projects in eclipse. - Zhijie On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur avin...@gmail.com wrote: hi, i have downloaded the hadoop code. And executed maven command successfully. how to import hadoop source code cleanly. because its showing red exclamation mark on some of the modules while i am importing it. help me out. thanks in advance. -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: App Master issue.
Hi MJ, Extremely sorry for a late response...Had some infrastructure issues here... I am using Hadoop 2.3.0. Actually when i was trying to solve this AppMaster issue, i came up with a strange observation. STICKY SLOT of app-master to only the data node at the Master node if i set the following parameters along with *yarn.resourcemanager.hostname* *yarn.resourcemanager.address to master:8034* *yarn.resourcemanager.scheduler.address to master:8030* *yarn.resourcemanager.resource-tracker.address to master:8025 *across all the slave nodes. The default values can be found here... https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml I came up with this strange observation though...If i dont set this 3 values i had described above, always in the slave nodes if the appmaster is launced, it tries connecting to the resource manager to the default 0.0.0.0 and not to the specified one. But with this values set, app-master is always launced in the master and everything seems fine...So i brought the datanode in the master down and checked as to what is happening... Strange, the jobs are not even assigned to any appmaster...but i think appmaster doesnt have any property of getting sticky...resource manager looks for a free container and goes ahead launching it... So now these things needs to be resolved: 1) Why does it work fine if the 3 mentioned values are set in the slave nodes and app master gets launced only in the master node' 2) If no data node is there in the master, then application doesnt get assigned at all to any app master. I have attached my config files for your reference...[Renamed for better reading n understanding] Thanks for your response !! On Thu, Mar 6, 2014 at 7:34 AM, Mingjiang Shi m...@gopivotal.com wrote: Sorry, it should be accessing http://node_manager_ip:8042/conf to check the value of yarn.resourcemanager. scheduler.address on the node manager. On Thu, Mar 6, 2014 at 9:36 AM, Mingjiang Shi m...@gopivotal.com wrote: Hi Sai, A few questions: 1. which version of hadoop are you using? yarn.resourcemanager.hostname is a new configuration which is not available old versions. 2. Does your yarn-site.xml contains yarn.resourcemanager.scheduler.address? If yes, what's the value? 3. or you could access http://resource_mgr:8088/conf to check the value of yarn.resourcemanager.scheduler.address. On Thu, Mar 6, 2014 at 3:29 AM, Sai Prasanna ansaiprasa...@gmail.comwrote: Hi, I have a five node cluster. One master and 4 slaves. Infact master also has a data node running. When ever app master is launched in the master node, simple wordcount program runs fine. But if it is launched in some slave nodes, the progress of the application gets hung. The problem is, though i have set the yarn.resourcemanager.hostname to the ip-address of the master, the slave connects only to the default, 0.0.0.0:8030. What could be the reason ??? I get the following message in the logs of app.master in web-UI. *...Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.* *2014-03-05 20:15:50,597 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2014-03-05 20:15:50,603 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030 http://0.0.0.0:80302014-03-05 20:15:56,632 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030 http://0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)* ?xml version=1.0? !-- Licensed under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -- configuration property namemapreduce.job.reduces/name value3/value description~I changed it so that multiple reduce tasks can be launched/description /property property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property property nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property nameyarn.scheduler.minimum-allocation-mb/name value128/value descriptionMinimum limit of memory to allocate to each container request at the Resource Manager./description
GC overhead limit exceeded
Hi: i have a problem when run Hibench with hadoop-2.2.0, the wrong message list as below 14/03/07 13:54:53 INFO mapreduce.Job: map 19% reduce 0% 14/03/07 13:54:54 INFO mapreduce.Job: map 21% reduce 0% 14/03/07 14:00:26 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_20_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:00:27 INFO mapreduce.Job: map 20% reduce 0% 14/03/07 14:00:40 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_08_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:00:41 INFO mapreduce.Job: map 19% reduce 0% 14/03/07 14:00:59 INFO mapreduce.Job: map 20% reduce 0% 14/03/07 14:00:59 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_15_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:01:00 INFO mapreduce.Job: map 19% reduce 0% 14/03/07 14:01:03 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_23_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:01:11 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_26_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:01:35 INFO mapreduce.Job: map 20% reduce 0% 14/03/07 14:01:35 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_19_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:01:36 INFO mapreduce.Job: map 19% reduce 0% 14/03/07 14:01:43 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_07_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:00 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_00_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:01 INFO mapreduce.Job: map 18% reduce 0% 14/03/07 14:02:23 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_21_0, Status : FAILED Error: Java heap space 14/03/07 14:02:24 INFO mapreduce.Job: map 17% reduce 0% 14/03/07 14:02:31 INFO mapreduce.Job: map 18% reduce 0% 14/03/07 14:02:33 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_29_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:34 INFO mapreduce.Job: map 17% reduce 0% 14/03/07 14:02:38 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_10_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:41 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_18_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:43 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_14_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:47 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_28_0, Status : FAILED Error: Java heap space 14/03/07 14:02:50 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_02_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:51 INFO mapreduce.Job: map 16% reduce 0% 14/03/07 14:02:51 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_05_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:52 INFO mapreduce.Job: map 15% reduce 0% 14/03/07 14:02:55 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_06_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:57 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_27_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:02:58 INFO mapreduce.Job: map 14% reduce 0% 14/03/07 14:03:04 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_09_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:03:05 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_17_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:03:05 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_22_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:03:06 INFO mapreduce.Job: map 12% reduce 0% 14/03/07 14:03:10 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_01_0, Status : FAILED Error: GC overhead limit exceeded 14/03/07 14:03:11 INFO mapreduce.Job: map 13% reduce 0% 14/03/07 14:03:11 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_24_0, Status : FAILED and then i add a parameter mapred.child.java.opts to the file mapred-site.xml, property namemapred.child.java.opts/name value-Xmx1024m/value /property then another error occurs as below 14/03/07 11:21:51 INFO mapreduce.Job: map 0% reduce 0% 14/03/07 11:21:59 INFO mapreduce.Job: Task Id : attempt_1394160253524_0003_m_02_0, Status : FAILED Container [pid=5592,containerID=container_1394160253524_0003_01_04] is running beyond virtual memory limits. Current usage: 112.6 MB of 1 GB physical memory used; 2.7 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1394160253524_0003_01_04 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 5598 5592 5592 5592 (java) 563 14
Re: MapReduce: How to output multiplt Avro files?
thanks, Alan, it works! 2014-03-07 11:21 GMT+08:00 Alan Paulsen phe...@gmail.com: Hi Fengyun, Here's what I've done in the past when facing a similar issue: 1) Set the map output schema to a UNION of both of your target schemas, A and B. 2) Serialize the data in the mappers, using the avro datum as the value. 3) Figure out what the avro schema is for each datum and write out the data in the reducer. Thanks, Alan *From:* Fengyun RAO [mailto:raofeng...@gmail.com] *Sent:* Thursday, March 06, 2014 2:14 AM *To:* user@hadoop.apache.org; u...@avro.apache.org *Subject:* Re: MapReduce: How to output multiplt Avro files? add avro user mail-list 2014-03-06 16:09 GMT+08:00 Fengyun RAO raofeng...@gmail.com: our input is a line of text which may be parsed to e.g. A or B object. We want all A objects written to A.avro files, while all B objects written to B.avro. I looked into AvroMultipleOutputs class: http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html There is an example, however, it's not quite clear. For job submission, it uses AvroMultipleOutputs.addNamedOutput to add schemas for A and B. In my program looks like: AvroMultipleOutputs.addNamedOutput(job, A, AvroKeyOutputFormat.class, aSchema, null); AvroMultipleOutputs.addNamedOutput(job, B, AvroKeyOutputFormat.class, bSchema, null); I believe this is for Reducer output files. *My question is* what the Mapper output should be, in specific what job.setMapOutputValueClass should be, since the Mapper output could be A or B object, with schema aSchema or bSchema. In my progam, I simply set it to GenericData, but get error as below: 14/03/06 15:55:34 INFO mapreduce.Job: Task Id : attempt_1393817780522_0012_m_10_2, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) I have no idea what this means.