hadoop streaming : need help in using custom key value separator
When I am using more than one reducer in hadoop streaming where I am using my custom separater rather than the tab, it looks like the hadoop shuffling process is not happening as it should. This is the reducer output when I am using '\t' to separate my key value pair that is output from the mapper. *output from reducer 1:* 10321,22 23644,37 41231,42 23448,20 12325,39 71234,20 *output from reducer 2:* 24123,43 33213,46 11321,29 21232,32 the above output is as expected the first column is the key and the second value is the count. There are 10 unique keys and 6 of them are in output of the first reducer and the remaining 4 int the second reducer output. But now when I use a custom separater for my key value pair output from my mapper. Here I am using '*' as the separator -D stream.mapred.output.field.separator=* -D mapred.reduce.tasks=2 *output from reducer 1:* 10321,5 21232,19 24123,16 33213,28 23644,21 41231,12 23448,18 11321,29 12325,24 71234,9 * * *output from reducer 2:* 10321,17 21232,13 33213,18 23644,16 41231,30 23448,2 24123,27 12325,15 71234,11 Now both the reducers are getting all the keys and part of the values go to reducer 1 and part of the reducer go to reducer 2. Why is it behaving like this when I am using a custom separator, shouldn't each reducer get a unique key after the shuffling? I am using Hadoop 0.20.205.0 and below is the command that I am using to run hadoop streaming. Is there some more options that I should specify for hadoop streaming to work properly if I am using a custom separator? hadoop jar $HADOOP_PREFIX/contrib/streaming/hadoop-streaming-0.20.205.0.jar -D stream.mapred.output.field.separator=* -D mapred.reduce.tasks=2 -mapper ./map.py -reducer ./reducer.py -file ./map.py -file ./reducer.py -input /user/inputdata -output /user/outputdata -verbose Any help is much appreciated, Thanks, Austin
Need help on hadoop eclipse plugin
Hi all, I am trying to use hadoop eclipse plugin on my windows machine to connect to the my remote hadoop cluster. I am currently using putty to login to the cluster. So ssh is enable and my windows machine is able to listen to my hadoop cluster. I am using hadoop 0.20.205, hadoop-eclipse plugin -0.20.205.jar . eclipse helios Version: 3.6.2, Oracle JDK 1.7 If I am using original eclipse-plugin.jar by putting it inside my $ECLIPSE_HOME/dropins or /plugins folder, I am able to see Hadoop map-reduce perspective. But after specifying hadoop NN / JT connections, I am seeing the following error, whenever I am trying to access the HDFS. An internal error occurred during: "Connecting to DFS lxe9700". org/apache/commons/configuration/Configuration "Connecting to DFS lxe9700' has encountered a problem. An internal error occured during " Connecting to DFS" After seeing the .log file .. I am seeing the following lines : !MESSAGE An internal error occurred during: "Connecting to DFS lxe9700". !STACK 0 java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:37) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:34) at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216) at org.apache.hadoop.security.KerberosName.(KerberosName.java:83) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:189) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395) at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1436) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1337) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:122) at org.apache.hadoop.eclipse.server.HadoopServer.getDFS(HadoopServer.java:469) at org.apache.hadoop.eclipse.dfs.DFSPath.getDFS(DFSPath.java:146) at org.apache.hadoop.eclipse.dfs.DFSFolder.loadDFSFolderChildren(DFSFolder.java:61) at org.apache.hadoop.eclipse.dfs.DFSFolder$1.run(DFSFolder.java:178) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54) Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:506) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:422) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:410) at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107) at java.lang.ClassLoader.loadClass(Unknown Source) ... 21 more !ENTRY org.eclipse.jface 4 0 2012-01-03 02:47:50.812 !MESSAGE The command ("dfs.browser.action.download") is undefined !STACK 0 java.lang.Exception at org.eclipse.jface.action.ExternalActionManager$CommandCallback.isActive(ExternalActionManager.java:370) at org.eclipse.jface.action.ActionContributionItem.isCommandActive(ActionContributionItem.java:647) at org.eclipse.jface.action.ActionContributionItem.isVisible(ActionContributionItem.java:703) at org.eclipse.jface.action.MenuManager.isChildVisible(MenuManager.java:985) at org.eclipse.jface.action.MenuManager.update(MenuManager.java:759) at org.eclipse.jface.action.MenuManager.handleAboutToShow(MenuManager.java:470) at org.eclipse.jface.action.MenuManager.access$1(MenuManager.java:465) at org.eclipse.jface.action.MenuManager$2.menuShown(MenuManager.java:491) at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:241) at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1053) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1077) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1058) at org.eclipse.swt.widgets.Control.WM_INITMENUPOPUP(Control.java:4487) at org.eclipse.swt.widgets.Control.windowProc(Control.java:4190) at org.eclipse.swt.widgets.Canvas.windowProc(Canvas.java:341) at org.eclipse.swt.widgets.Decorations.windowProc(Decorations.java:1598) at org.eclipse.swt.widgets.Shell.windowProc(Shell.java:2038) at org.eclipse.sw
Re: Handling bad records
Thanks that's helpful. In that example what is "A" and "B" referring to? Is that the output file name? mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye")); mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau")); On Mon, Feb 27, 2012 at 9:53 PM, Harsh J wrote: > Mohit, > > Use the MultipleOutputs API: > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html > to have a named output of bad records. There is an example of use > detailed on the link. > > On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia > wrote: > > What's the best way to write records to a different file? I am doing xml > > processing and during processing I might come accross invalid xml format. > > Current I have it under try catch block and writing to log4j. But I think > > it would be better to just write it to an output file that just contains > > errors. > > > > -- > Harsh J >
Re: Handling bad records
Mohit, Use the MultipleOutputs API: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html to have a named output of bad records. There is an example of use detailed on the link. On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia wrote: > What's the best way to write records to a different file? I am doing xml > processing and during processing I might come accross invalid xml format. > Current I have it under try catch block and writing to log4j. But I think > it would be better to just write it to an output file that just contains > errors. -- Harsh J
Re: Invocation exception
On Mon, Feb 27, 2012 at 8:58 PM, Prashant Kommireddi wrote: > Tom White's Definitive Guide book is a great reference. Answers to > most of your questions could be found there. > > I've been through that book but haven't come accross how to debug this exception. Can you point me to the topic in that book where I'll find this information? > Sent from my iPhone > > On Feb 27, 2012, at 8:54 PM, Mohit Anchlia wrote: > > > Does it matter if reducer is set even if the no of reducers is 0? Is > there > > a way to get more clear reason? > > > > On Mon, Feb 27, 2012 at 8:23 PM, Subir S > wrote: > > > >> On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia >>> wrote: > >> > >>> For some reason I am getting invocation exception and I don't see any > >> more > >>> details other than this exception: > >>> > >>> My job is configured as: > >>> > >>> > >>> JobConf conf = *new* JobConf(FormMLProcessor.*class*); > >>> > >>> conf.addResource("hdfs-site.xml"); > >>> > >>> conf.addResource("core-site.xml"); > >>> > >>> conf.addResource("mapred-site.xml"); > >>> > >>> conf.set("mapred.reduce.tasks", "0"); > >>> > >>> conf.setJobName("mlprocessor"); > >>> > >>> DistributedCache.*addFileToClassPath*(*new* > Path("/jars/analytics.jar"), > >>> conf); > >>> > >>> DistributedCache.*addFileToClassPath*(*new* Path("/jars/common.jar"), > >>> conf); > >>> > >>> conf.setOutputKeyClass(Text.*class*); > >>> > >>> conf.setOutputValueClass(Text.*class*); > >>> > >>> conf.setMapperClass(Map.*class*); > >>> > >>> conf.setCombinerClass(Reduce.*class*); > >>> > >>> conf.setReducerClass(IdentityReducer.*class*); > >>> > >> > >> Why would you set the Reducer when the number of reducers is set to > zero. > >> Not sure if this is the real cause. > >> > >> > >>> > >>> conf.setInputFormat(SequenceFileAsTextInputFormat.*class*); > >>> > >>> conf.setOutputFormat(TextOutputFormat.*class*); > >>> > >>> FileInputFormat.*setInputPaths*(conf, *new* Path(args[0])); > >>> > >>> FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1])); > >>> > >>> JobClient.*runJob*(conf); > >>> > >>> - > >>> * > >>> > >>> java.lang.RuntimeException*: Error in configuring object > >>> > >>> at org.apache.hadoop.util.ReflectionUtils.setJobConf(* > >>> ReflectionUtils.java:93*) > >>> > >>> at > >>> > org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*) > >>> > >>> at org.apache.hadoop.util.ReflectionUtils.newInstance(* > >>> ReflectionUtils.java:117*) > >>> > >>> at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*) > >>> > >>> at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*) > >>> > >>> at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*) > >>> > >>> at java.security.AccessController.doPrivileged(*Native Method*) > >>> > >>> at javax.security.auth.Subject.doAs(*Subject.java:396*) > >>> > >>> at org.apache.hadoop.security.UserGroupInformation.doAs(* > >>> UserGroupInformation.java:1157*) > >>> > >>> at org.apache.hadoop.mapred.Child.main(*Child.java:264*) > >>> > >>> Caused by: *java.lang.reflect.InvocationTargetException > >>> * > >>> > >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*) > >>> > >>> at sun.reflect.NativeMethodAccessorImpl.invoke(* > >>> NativeMethodAccessorImpl.java:39*) > >>> > >>> at > >>> > >>> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav > >>> > >> >
Re: Invocation exception
Tom White's Definitive Guide book is a great reference. Answers to most of your questions could be found there. Sent from my iPhone On Feb 27, 2012, at 8:54 PM, Mohit Anchlia wrote: > Does it matter if reducer is set even if the no of reducers is 0? Is there > a way to get more clear reason? > > On Mon, Feb 27, 2012 at 8:23 PM, Subir S wrote: > >> On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia >> wrote: >> >>> For some reason I am getting invocation exception and I don't see any >> more >>> details other than this exception: >>> >>> My job is configured as: >>> >>> >>> JobConf conf = *new* JobConf(FormMLProcessor.*class*); >>> >>> conf.addResource("hdfs-site.xml"); >>> >>> conf.addResource("core-site.xml"); >>> >>> conf.addResource("mapred-site.xml"); >>> >>> conf.set("mapred.reduce.tasks", "0"); >>> >>> conf.setJobName("mlprocessor"); >>> >>> DistributedCache.*addFileToClassPath*(*new* Path("/jars/analytics.jar"), >>> conf); >>> >>> DistributedCache.*addFileToClassPath*(*new* Path("/jars/common.jar"), >>> conf); >>> >>> conf.setOutputKeyClass(Text.*class*); >>> >>> conf.setOutputValueClass(Text.*class*); >>> >>> conf.setMapperClass(Map.*class*); >>> >>> conf.setCombinerClass(Reduce.*class*); >>> >>> conf.setReducerClass(IdentityReducer.*class*); >>> >> >> Why would you set the Reducer when the number of reducers is set to zero. >> Not sure if this is the real cause. >> >> >>> >>> conf.setInputFormat(SequenceFileAsTextInputFormat.*class*); >>> >>> conf.setOutputFormat(TextOutputFormat.*class*); >>> >>> FileInputFormat.*setInputPaths*(conf, *new* Path(args[0])); >>> >>> FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1])); >>> >>> JobClient.*runJob*(conf); >>> >>> - >>> * >>> >>> java.lang.RuntimeException*: Error in configuring object >>> >>> at org.apache.hadoop.util.ReflectionUtils.setJobConf(* >>> ReflectionUtils.java:93*) >>> >>> at >>> org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*) >>> >>> at org.apache.hadoop.util.ReflectionUtils.newInstance(* >>> ReflectionUtils.java:117*) >>> >>> at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*) >>> >>> at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*) >>> >>> at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*) >>> >>> at java.security.AccessController.doPrivileged(*Native Method*) >>> >>> at javax.security.auth.Subject.doAs(*Subject.java:396*) >>> >>> at org.apache.hadoop.security.UserGroupInformation.doAs(* >>> UserGroupInformation.java:1157*) >>> >>> at org.apache.hadoop.mapred.Child.main(*Child.java:264*) >>> >>> Caused by: *java.lang.reflect.InvocationTargetException >>> * >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*) >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke(* >>> NativeMethodAccessorImpl.java:39*) >>> >>> at >>> >>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav >>> >>
Re: Invocation exception
Does it matter if reducer is set even if the no of reducers is 0? Is there a way to get more clear reason? On Mon, Feb 27, 2012 at 8:23 PM, Subir S wrote: > On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia >wrote: > > > For some reason I am getting invocation exception and I don't see any > more > > details other than this exception: > > > > My job is configured as: > > > > > > JobConf conf = *new* JobConf(FormMLProcessor.*class*); > > > > conf.addResource("hdfs-site.xml"); > > > > conf.addResource("core-site.xml"); > > > > conf.addResource("mapred-site.xml"); > > > > conf.set("mapred.reduce.tasks", "0"); > > > > conf.setJobName("mlprocessor"); > > > > DistributedCache.*addFileToClassPath*(*new* Path("/jars/analytics.jar"), > > conf); > > > > DistributedCache.*addFileToClassPath*(*new* Path("/jars/common.jar"), > > conf); > > > > conf.setOutputKeyClass(Text.*class*); > > > > conf.setOutputValueClass(Text.*class*); > > > > conf.setMapperClass(Map.*class*); > > > > conf.setCombinerClass(Reduce.*class*); > > > > conf.setReducerClass(IdentityReducer.*class*); > > > > Why would you set the Reducer when the number of reducers is set to zero. > Not sure if this is the real cause. > > > > > > conf.setInputFormat(SequenceFileAsTextInputFormat.*class*); > > > > conf.setOutputFormat(TextOutputFormat.*class*); > > > > FileInputFormat.*setInputPaths*(conf, *new* Path(args[0])); > > > > FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1])); > > > > JobClient.*runJob*(conf); > > > > - > > * > > > > java.lang.RuntimeException*: Error in configuring object > > > > at org.apache.hadoop.util.ReflectionUtils.setJobConf(* > > ReflectionUtils.java:93*) > > > > at > > org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*) > > > > at org.apache.hadoop.util.ReflectionUtils.newInstance(* > > ReflectionUtils.java:117*) > > > > at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*) > > > > at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*) > > > > at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*) > > > > at java.security.AccessController.doPrivileged(*Native Method*) > > > > at javax.security.auth.Subject.doAs(*Subject.java:396*) > > > > at org.apache.hadoop.security.UserGroupInformation.doAs(* > > UserGroupInformation.java:1157*) > > > > at org.apache.hadoop.mapred.Child.main(*Child.java:264*) > > > > Caused by: *java.lang.reflect.InvocationTargetException > > * > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke(* > > NativeMethodAccessorImpl.java:39*) > > > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav > > >
Re: Invocation exception
On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia wrote: > For some reason I am getting invocation exception and I don't see any more > details other than this exception: > > My job is configured as: > > > JobConf conf = *new* JobConf(FormMLProcessor.*class*); > > conf.addResource("hdfs-site.xml"); > > conf.addResource("core-site.xml"); > > conf.addResource("mapred-site.xml"); > > conf.set("mapred.reduce.tasks", "0"); > > conf.setJobName("mlprocessor"); > > DistributedCache.*addFileToClassPath*(*new* Path("/jars/analytics.jar"), > conf); > > DistributedCache.*addFileToClassPath*(*new* Path("/jars/common.jar"), > conf); > > conf.setOutputKeyClass(Text.*class*); > > conf.setOutputValueClass(Text.*class*); > > conf.setMapperClass(Map.*class*); > > conf.setCombinerClass(Reduce.*class*); > > conf.setReducerClass(IdentityReducer.*class*); > Why would you set the Reducer when the number of reducers is set to zero. Not sure if this is the real cause. > > conf.setInputFormat(SequenceFileAsTextInputFormat.*class*); > > conf.setOutputFormat(TextOutputFormat.*class*); > > FileInputFormat.*setInputPaths*(conf, *new* Path(args[0])); > > FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1])); > > JobClient.*runJob*(conf); > > - > * > > java.lang.RuntimeException*: Error in configuring object > > at org.apache.hadoop.util.ReflectionUtils.setJobConf(* > ReflectionUtils.java:93*) > > at > org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*) > > at org.apache.hadoop.util.ReflectionUtils.newInstance(* > ReflectionUtils.java:117*) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*) > > at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*) > > at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*) > > at java.security.AccessController.doPrivileged(*Native Method*) > > at javax.security.auth.Subject.doAs(*Subject.java:396*) > > at org.apache.hadoop.security.UserGroupInformation.doAs(* > UserGroupInformation.java:1157*) > > at org.apache.hadoop.mapred.Child.main(*Child.java:264*) > > Caused by: *java.lang.reflect.InvocationTargetException > * > > at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*) > > at sun.reflect.NativeMethodAccessorImpl.invoke(* > NativeMethodAccessorImpl.java:39*) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav >
RE: jobtracker always say 'tip is null'
Hi Harsh, I have tried to install hadoop1.0 in hp-ux but fail to run it. because The shell of hp-ux and Linux syntax is slightly different. Best Regards Yonggang Li -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Monday, February 27, 2012 8:01 PM To: common-user@hadoop.apache.org Subject: Re: jobtracker always say 'tip is null' Hi Yonggang, Unfortunately you're using a very old version, so its hard to tell what was wrong with it. Could you please try upgrading the the most recent stable release (1.0.x)? We've not seen this issue come up in the last couple of years, so it may have been a bug fixed quite some time ago. On Mon, Feb 27, 2012 at 1:47 PM, Li, Yonggang wrote: > Hi All, > I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker > always say : > Tip is null > Serious problem. While updating status, cannot find tasked > > Below is jobtrack log: > 2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: > oldState is RUNNING,newState is RUNNING > 2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus > is 1, newStatus is 1 > 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is > org.apache.hadoop.mapred.TaskInProgress@3bf9ff > 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: > oldState is RUNNING,newState is KILLED > 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is > KILLED > 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: > shouldFail is null > 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus > is 1, newStatus is 1 > 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471' > 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is > org.apache.hadoop.mapred.TaskInProgress@a11b29 > 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is > SUCCEEDED > 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_20120223171354_20120224185829_0019_m_04_0' ha > s completed task_20120223171354_20120224185829_0019_m_04 successfully. > 2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job > with id: 'job_20120223171354_20120224160112_0006' of u > ser: 'ecip' > 2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus > is 1, newStatus is 3 > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' > 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' > 2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' > 20
Re: Bypassing reducer
Try setting numbers of the reducers to 0. On 2/27/12 2:34 PM, "Mohit Anchlia" wrote: >Is there a way to completely bypass reduce step? Pig is able to do it but >it doesn't work for me in map reduce program even though I've commented >setReducerClass
Handling bad records
What's the best way to write records to a different file? I am doing xml processing and during processing I might come accross invalid xml format. Current I have it under try catch block and writing to log4j. But I think it would be better to just write it to an output file that just contains errors.
Re: dfs.block.size
"hadoop fsck -blocks" is something that I think of quickly. http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck has more details Kai Am 28.02.2012 um 02:30 schrieb Mohit Anchlia: > How do I verify the block size of a given file? Is there a command? > > On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria wrote: > >> dfs.block.size can be set per job. >> >> mapred.tasktracker.map.tasks.maximum is per tasktracker. >> >> -Joey >> >> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia >> wrote: >>> Can someone please suggest if parameters like dfs.block.size, >>> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or >> can >>> these be set per client job configuration? >>> >>> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia >> wrote: >>> If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it >> need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file? >> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> -- Kai Voigt k...@123.org
Re: dfs.block.size
How do I verify the block size of a given file? Is there a command? On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria wrote: > dfs.block.size can be set per job. > > mapred.tasktracker.map.tasks.maximum is per tasktracker. > > -Joey > > On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia > wrote: > > Can someone please suggest if parameters like dfs.block.size, > > mapred.tasktracker.map.tasks.maximum are only cluster wide settings or > can > > these be set per client job configuration? > > > > On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia >wrote: > > > >> If I want to change the block size then can I use Configuration in > >> mapreduce job and set it when writing to the sequence file or does it > need > >> to be cluster wide setting in .xml files? > >> > >> Also, is there a way to check the block of a given file? > >> > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
Re: Task Killed but no errors
On 2/27/2012 1:55 PM, Mohit Anchlia wrote: I submitted a map reduce job that had 9 tasks killed out of 139. But I don't see any errors in the admin page. The entire job however has SUCCEDED. How can I track down the reason? Also, how do I determine if this is something to worry about? Hi, You should go to the data nodes and check /logs/userlogs/ directories there. Though I am not either very clear about this case: If you are working on the administrated cluster and you don't have access to the data nodes, how to check the error logs. For me I have to ask administrator forward me the error log sometimes. The logs in jobtracker and namenode are very limited. And the datanode info in admin web page is blocked because of security reasons? I didn't follow up the new releases about this issue. Maybe in 1.0.x they have improved methods about this? Shi
Re: Task Killed but no errors
You probably have speculative execution enabled. That¹s normal for job tracker to launch multiple tasks and take result of the ones that complited first. Regards, Serge On 2/27/12 11:55 AM, "Mohit Anchlia" wrote: >I submitted a map reduce job that had 9 tasks killed out of 139. But I >don't see any errors in the admin page. The entire job however has >SUCCEDED. How can I track down the reason? > >Also, how do I determine if this is something to worry about?
Task Killed but no errors
I submitted a map reduce job that had 9 tasks killed out of 139. But I don't see any errors in the admin page. The entire job however has SUCCEDED. How can I track down the reason? Also, how do I determine if this is something to worry about?
Re: Setting up Hadoop single node setup on Mac OS X
Seconded, I've setup and run Hadoop CDH3 on a recent 10.7(.2) Mac. Works like a charm. Sent from my phone, please excuse my brevity. Keith Wiley, kwi...@keithwiley.com, http://keithwiley.com Serge Blazhievsky wrote: Hi I have detailed instructions online here: http://hadoopway.blogspot.com/ It works on MAC and all software is open source. Serge On 2/26/12 8:28 PM, "Sriram Ganesan" wrote: >Hello All, > >I am a beginning hadoop user. I am trying to install hadoop as part of a >single-node setup. I read in the documentation that the supported >platforms >are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node >setup. I am guessing I need to use some virtualization solution like >VirtualBox >to run Linux. If anyone has a better way of running hadoop on a mac, >please >kindly share your experiences. If this question is not appropriate for >this >mailing list, I apologize and please kindly let me know what is the best >mailing list to post this question. > >Thanks >Sriram
Re: Can't build hadoop-1.0.1 -- Break building fuse-dfs
Hello, I found a work around for this problem -- The libhdfs files were elsewhere in the build in $HADOOP_HOME/build/c+ +/Linux-amd64-64/lib/ and not in the $HADOOP_HOME/build/libhdfs directory as the Makefile in fuse-dfs were pointing to. Regards, Kumar Kumar Ravi From: Kumar Ravi/Austin/IBM@IBMUS To: common-user@hadoop.apache.org Date: 02/27/2012 10:22 AM Subject:Can't build hadoop-1.0.1 -- Break building fuse-dfs Hello, I am running into the following problem building hadoop-1.0.1: - [exec] make[1]: Entering directory `/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs' [exec] make[1]: Nothing to be done for `all-am'. [exec] make[1]: Leaving directory `/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs' [exec] Making all in src [exec] make[1]: Entering directory `/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs/src' [exec] gcc -Wall -O3 -L/home/kumar/hadoop-1.0.1/build/libhdfs -lhdfs -L/lib -lfuse -L/usr/java/jdk1.6.0_27//jre/lib/amd64/server -ljvm -o fuse_dfs fuse_dfs.o fuse_options.o fuse_trash.o fuse_stat_struct.o fuse_users.o fuse_init.o fuse_connect.o fuse_impls_access.o fuse_impls_chmod.o fuse_impls_chown.o fuse_impls_create.o fuse_impls_flush.o fuse_impls_getattr.o fuse_impls_mkdir.o fuse_impls_mknod.o fuse_impls_open.o fuse_impls_read.o fuse_impls_release.o fuse_impls_readdir.o fuse_impls_rename.o fuse_impls_rmdir.o fuse_impls_statfs.o fuse_impls_symlink.o fuse_impls_truncate.o fuse_impls_utimens.o fuse_impls_unlink.o fuse_impls_write.o [exec] /usr/bin/ld: cannot find -lhdfs [exec] collect2: ld returned 1 exit status --- Src. was downloaded from -- http://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.1/ using svn, and the ant command with target used was: ant -Dlibhdfs=true -Dcompile.native=true -Dfusedfs=true -Dcompile.c++=true -Dforrest.home=/apache-forrest-0.8/ compile-core-native compile-c++ compile-c++-examples task-controller tar record-parser compile-hdfs-classes package -Djava5.home=/opt/sun/jdk1.5.0_22/ I am using Sun Java JDK 1.6.0_31 - java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) I would appreciate any pointers to getting past this problem. Kumar Ravi
Re: Setting up Hadoop single node setup on Mac OS X
You could also use vmware Fusion on a MacŠ I do this when I'm creating a distributed hadoop cluster with a few data nodes, but just for a single node, you can install that on a Mac OSX, no need for virtualization. Peter J On 2/26/12 8:28 PM, "Sriram Ganesan" wrote: >Hello All, > >I am a beginning hadoop user. I am trying to install hadoop as part of a >single-node setup. I read in the documentation that the supported >platforms >are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node >setup. I am guessing I need to use some virtualization solution like >VirtualBox >to run Linux. If anyone has a better way of running hadoop on a mac, >please >kindly share your experiences. If this question is not appropriate for >this >mailing list, I apologize and please kindly let me know what is the best >mailing list to post this question. > >Thanks >Sriram
Re: Setting up Hadoop single node setup on Mac OS X
Good to know about the VirtualBox instructions. Here are a couple of other links that might help on single node: Single Node Setup http://hadoop.apache.org/common/docs/stable/single_node_setup.html Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster) http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster) Art Ignacio hortonworks.com On Mon, Feb 27, 2012 at 8:49 AM, Serge Blazhievsky < serge.blazhiyevs...@nice.com> wrote: > Hi > > I have detailed instructions online here: > > http://hadoopway.blogspot.com/ > > > It works on MAC and all software is open source. > > Serge > > On 2/26/12 8:28 PM, "Sriram Ganesan" wrote: > > >Hello All, > > > >I am a beginning hadoop user. I am trying to install hadoop as part of a > >single-node setup. I read in the documentation that the supported > >platforms > >are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node > >setup. I am guessing I need to use some virtualization solution like > >VirtualBox > >to run Linux. If anyone has a better way of running hadoop on a mac, > >please > >kindly share your experiences. If this question is not appropriate for > >this > >mailing list, I apologize and please kindly let me know what is the best > >mailing list to post this question. > > > >Thanks > >Sriram > >
Re: Setting up Hadoop single node setup on Mac OS X
You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is.
Re: Setting up Hadoop single node setup on Mac OS X
Hi I have detailed instructions online here: http://hadoopway.blogspot.com/ It works on MAC and all software is open source. Serge On 2/26/12 8:28 PM, "Sriram Ganesan" wrote: >Hello All, > >I am a beginning hadoop user. I am trying to install hadoop as part of a >single-node setup. I read in the documentation that the supported >platforms >are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node >setup. I am guessing I need to use some virtualization solution like >VirtualBox >to run Linux. If anyone has a better way of running hadoop on a mac, >please >kindly share your experiences. If this question is not appropriate for >this >mailing list, I apologize and please kindly let me know what is the best >mailing list to post this question. > >Thanks >Sriram
Setting up Hadoop single node setup on Mac OS X
Hello All, I am a beginning hadoop user. I am trying to install hadoop as part of a single-node setup. I read in the documentation that the supported platforms are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node setup. I am guessing I need to use some virtualization solution like VirtualBox to run Linux. If anyone has a better way of running hadoop on a mac, please kindly share your experiences. If this question is not appropriate for this mailing list, I apologize and please kindly let me know what is the best mailing list to post this question. Thanks Sriram
Can't build hadoop-1.0.1 -- Break building fuse-dfs
Hello, I am running into the following problem building hadoop-1.0.1: - [exec] make[1]: Entering directory `/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs' [exec] make[1]: Nothing to be done for `all-am'. [exec] make[1]: Leaving directory `/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs' [exec] Making all in src [exec] make[1]: Entering directory `/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs/src' [exec] gcc -Wall -O3 -L/home/kumar/hadoop-1.0.1/build/libhdfs -lhdfs -L/lib -lfuse -L/usr/java/jdk1.6.0_27//jre/lib/amd64/server -ljvm -o fuse_dfs fuse_dfs.o fuse_options.o fuse_trash.o fuse_stat_struct.o fuse_users.o fuse_init.o fuse_connect.o fuse_impls_access.o fuse_impls_chmod.o fuse_impls_chown.o fuse_impls_create.o fuse_impls_flush.o fuse_impls_getattr.o fuse_impls_mkdir.o fuse_impls_mknod.o fuse_impls_open.o fuse_impls_read.o fuse_impls_release.o fuse_impls_readdir.o fuse_impls_rename.o fuse_impls_rmdir.o fuse_impls_statfs.o fuse_impls_symlink.o fuse_impls_truncate.o fuse_impls_utimens.o fuse_impls_unlink.o fuse_impls_write.o [exec] /usr/bin/ld: cannot find -lhdfs [exec] collect2: ld returned 1 exit status --- Src. was downloaded from -- http://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.1/ using svn, and the ant command with target used was: ant -Dlibhdfs=true -Dcompile.native=true -Dfusedfs=true -Dcompile.c++=true -Dforrest.home=/apache-forrest-0.8/ compile-core-native compile-c++ compile-c++-examples task-controller tar record-parser compile-hdfs-classes package -Djava5.home=/opt/sun/jdk1.5.0_22/ I am using Sun Java JDK 1.6.0_31 - java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) I would appreciate any pointers to getting past this problem. Kumar Ravi
Re: dfs.block.size
dfs.block.size can be set per job. mapred.tasktracker.map.tasks.maximum is per tasktracker. -Joey On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia wrote: > Can someone please suggest if parameters like dfs.block.size, > mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can > these be set per client job configuration? > > On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia wrote: > >> If I want to change the block size then can I use Configuration in >> mapreduce job and set it when writing to the sequence file or does it need >> to be cluster wide setting in .xml files? >> >> Also, is there a way to check the block of a given file? >> -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: dfs.block.size
Can someone please suggest if parameters like dfs.block.size, mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can these be set per client job configuration? On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia wrote: > If I want to change the block size then can I use Configuration in > mapreduce job and set it when writing to the sequence file or does it need > to be cluster wide setting in .xml files? > > Also, is there a way to check the block of a given file? >
Re: jobtracker always say 'tip is null'
Hi Yonggang, Unfortunately you're using a very old version, so its hard to tell what was wrong with it. Could you please try upgrading the the most recent stable release (1.0.x)? We've not seen this issue come up in the last couple of years, so it may have been a bug fixed quite some time ago. On Mon, Feb 27, 2012 at 1:47 PM, Li, Yonggang wrote: > Hi All, > I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker > always say : > Tip is null > Serious problem. While updating status, cannot find tasked > > Below is jobtrack log: > 2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: > oldState is RUNNING,newState is RUNNING > 2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus > is 1, newStatus is 1 > 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is > org.apache.hadoop.mapred.TaskInProgress@3bf9ff > 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: > oldState is RUNNING,newState is KILLED > 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is > KILLED > 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: > shouldFail is null > 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus > is 1, newStatus is 1 > 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471' > 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is > org.apache.hadoop.mapred.TaskInProgress@a11b29 > 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is > SUCCEEDED > 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_20120223171354_20120224185829_0019_m_04_0' ha > s completed task_20120223171354_20120224185829_0019_m_04 successfully. > 2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job > with id: 'job_20120223171354_20120224160112_0006' of u > ser: 'ecip' > 2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus > is 1, newStatus is 3 > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244' > 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' > 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' > 2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' > 2012-02-24 19:20:47,312 INFO org.apache.hadoop.mapred.JobTracker: tip is null > 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_20120223171354_20120224185829_0019 > _m_03_0' from 'tracker_psns200n:localhost/127.0.0.1:56471' > 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed > complete
RE: BZip2 Splittable?
Thanks to everyone with their help on this. We are currently using pig, but I don't think that this is something that we are currently using, I will pass this recommendation on! Thanks again, Dan. -Original Message- From: Srinivas Surasani [mailto:hivehadooplearn...@gmail.com] Sent: 24 February 2012 21:08 To: common-user@hadoop.apache.org Subject: Re: BZip2 Splittable? @Daniel, If you want to process bz2 files in parallel( more than one mapper/reducer ), you can go for Pig. See below. Pig has inbuilt support for processing .bz2 files in parallel (.gz support is coming soon). If the input file name extension is .bz2, Pig decompresses the file on the fly and passes the decompressed input stream to your load function. Regards, On Fri, Feb 24, 2012 at 2:59 PM, Rohit wrote: > Hi Daniel, > > Because your MapReduce jobs will not split bzip2 files, each entire bzip2 > file will be processed by one Map task. Thus, if your job takes multiple > bzip2 text files as the input, then you'll have as many Map tasks as you > have files running in parallel. > > The Map tasks will be run by your TaskTrackers. Usually the cluster setup > has the DataNode and the TaskTracker processing running on the same > machines - so with 6 data nodes, you have 6 tasktrackers. > > Hope that answers your question. > > > Rohit Bakhshi > > > > www.hortonworks.com (http://www.hortonworks.com/) > > > > On Friday, February 24, 2012 at 7:59 AM, Daniel Baptista wrote: > > Hi Rohit, thanks for the response, this is pretty much as I expected and > hopefully adds weight to my other thoughts... > > > > Could this mean that all my datanodes are being sent all of the data or > that only one datanode is executing the job. > > > > Thanks again , Dan. > > > > -Original Message- > > From: Rohit Bakhshi [mailto:ro...@hortonworks.com] > > Sent: 24 February 2012 15:54 > > To: common-user@hadoop.apache.org (mailto:common-user@hadoop.apache.org) > > Subject: Re: BZip2 Splittable? > > > > Daniel, > > > > I just noticed your Hadoop version - 0.20.2. > > > > The JIRA fix below is for Hadoop 0.21.0, which is a different version. > So it may not be supported on your version of Hadoop. > > > > -- > > Rohit Bakhshi > > www.hortonworks.com (http://www.hortonworks.com/) > > > > > > > > > > On Friday, February 24, 2012 at 7:49 AM, Rohit Bakhshi wrote: > > > > > Hi Daniel, > > > > > > Bzip2 compression codec allows for splittable files. > > > > > > According to this Hadoop JIRA improvement, splitting of bzip2 > compressed files in Hadoop jobs is supported: > > > https://issues.apache.org/jira/browse/HADOOP-4012 > > > > > > -- > > > Rohit Bakhshi > > > www.hortonworks.com (http://www.hortonworks.com/) > > > > > > > > > > > > > > > On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote: > > > > > > > Hi All, > > > > > > > > I have a cluster of 6 datanodes, all running hadoop version 0.20.2, > r911707 that take a series of bzip2 compressed text files as input. > > > > > > > > I have read conflicting articles regarding whether or not hadoop can > split these bzip2 files, can anyone give me a definite answer? > > > > > > > > Thanks is advance, Dan. > > > > > > > > > > CONFIDENTIALITY - This email and any files transmitted with it, are > confidential, may be legally privileged and are intended solely for the use > of the individual or entity to whom they are addressed. If this has come to > you in error, you must not copy, distribute, disclose or use any of the > information it contains. Please notify the sender immediately and delete > them from your system. > > > > SECURITY - Please be aware that communication by email, by its very > nature, is not 100% secure and by communicating with Perform Group by email > you consent to us monitoring and reading any such correspondence. > > > > VIRUSES - Although this email message has been scanned for the presence > of computer viruses, the sender accepts no liability for any damage > sustained as a result of a computer virus and it is the recipient's > responsibility to ensure that email is virus free. > > > > AUTHORITY - Any views or opinions expressed in this email are solely > those of the sender and do not necessarily represent those of Perform Group. > > > > COPYRIGHT - Copyright of this email and any attachments belongs to > Perform Group, Companies House Registration number 6324278. > > -- Regards, -- Srinivas srini...@cloudwick.com
jobtracker always say 'tip is null'
Hi All, I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker always say : Tip is null Serious problem. While updating status, cannot find tasked Below is jobtrack log: 2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: oldState is RUNNING,newState is RUNNING 2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 1, newStatus is 1 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is org.apache.hadoop.mapred.TaskInProgress@3bf9ff 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: oldState is RUNNING,newState is KILLED 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is KILLED 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: shouldFail is null 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 1, newStatus is 1 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471' 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is org.apache.hadoop.mapred.TaskInProgress@a11b29 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is SUCCEEDED 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_20120223171354_20120224185829_0019_m_04_0' ha s completed task_20120223171354_20120224185829_0019_m_04 successfully. 2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job with id: 'job_20120223171354_20120224160112_0006' of u ser: 'ecip' 2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 1, newStatus is 3 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244' 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' 2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955' 2012-02-24 19:20:47,312 INFO org.apache.hadoop.mapred.JobTracker: tip is null 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _m_03_0' from 'tracker_psns200n:localhost/127.0.0.1:56471' 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_01_0' from 'tracker_psns200n:localhost/127.0.0.1:56471' 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_01_1' from 'tracker_psns200n:localhost/127.0.0.1:56471' 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_20120223171354_20120224185829_0019 _r_01_2' from 'tracker_psns200n:localhost/127.0.0.1:56471'