date:20120227

hadoop streaming : need help in using custom key value separator

2012-02-27 Thread Austin Chungath

When I am using more than one reducer in hadoop streaming where I am using
my custom separater rather than the tab, it looks like the hadoop shuffling
process is not happening as it should.

This is the reducer output when I am using '\t' to separate my key value
pair that is output from the mapper.

*output from reducer 1:*
10321,22
23644,37
41231,42
23448,20
12325,39
71234,20
*output from reducer 2:*
24123,43
33213,46
11321,29
21232,32

the above output is as expected the first column is the key and the second
value is the count. There are 10 unique keys and 6 of them are in output of
the first reducer and the remaining 4 int the second reducer output.

But now when I use a custom separater for my key value pair output from my
mapper. Here I am using '*' as the separator
-D stream.mapred.output.field.separator=*
-D mapred.reduce.tasks=2

*output from reducer 1:*
10321,5
21232,19
24123,16
33213,28
23644,21
41231,12
23448,18
11321,29
12325,24
71234,9
* *
*output from reducer 2:*
10321,17
21232,13
33213,18
23644,16
41231,30
23448,2
24123,27
12325,15
71234,11

Now both the reducers are getting all the keys and part of the values go to
reducer 1 and part of the reducer go to reducer 2.
Why is it behaving like this when I am using a custom separator, shouldn't
each reducer get a unique key after the shuffling?
I am using Hadoop 0.20.205.0 and below is the command that I am using to
run hadoop streaming. Is there some more options that I should specify for
hadoop streaming to work properly if I am using a custom separator?

hadoop jar
$HADOOP_PREFIX/contrib/streaming/hadoop-streaming-0.20.205.0.jar
-D stream.mapred.output.field.separator=*
-D mapred.reduce.tasks=2
-mapper ./map.py
-reducer ./reducer.py
-file ./map.py
-file ./reducer.py
-input /user/inputdata
-output /user/outputdata
-verbose


Any help is much appreciated,
Thanks,
Austin

Need help on hadoop eclipse plugin

2012-02-27 Thread praveenesh kumar

Hi all,

I am trying to use hadoop eclipse plugin on my windows machine to connect
to the my remote hadoop cluster. I am currently using putty to login to the
cluster. So ssh is enable and my windows machine is able to listen to my
hadoop cluster.

I am using hadoop 0.20.205, hadoop-eclipse plugin -0.20.205.jar . eclipse
helios Version: 3.6.2,  Oracle JDK 1.7

If I am using original eclipse-plugin.jar by putting it inside my
$ECLIPSE_HOME/dropins or /plugins folder, I am able to see Hadoop
map-reduce perspective.

But after specifying hadoop NN / JT connections, I am seeing the following
error, whenever I am trying to access the HDFS.

An internal error occurred during: "Connecting to DFS lxe9700".
org/apache/commons/configuration/Configuration

"Connecting to DFS lxe9700' has encountered a problem.
An internal error occured during " Connecting to DFS"

After seeing the .log file .. I am seeing the following lines :

!MESSAGE An internal error occurred during: "Connecting to DFS lxe9700".
!STACK 0
java.lang.NoClassDefFoundError:
org/apache/commons/configuration/Configuration
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:37)
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:34)
at
org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196)
at
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
at
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
at
org.apache.hadoop.security.KerberosName.(KerberosName.java:83)
at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:189)
at
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
at
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
at
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409)
at
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395)
at
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1436)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1337)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:122)
at
org.apache.hadoop.eclipse.server.HadoopServer.getDFS(HadoopServer.java:469)
at org.apache.hadoop.eclipse.dfs.DFSPath.getDFS(DFSPath.java:146)
at
org.apache.hadoop.eclipse.dfs.DFSFolder.loadDFSFolderChildren(DFSFolder.java:61)
at org.apache.hadoop.eclipse.dfs.DFSFolder$1.run(DFSFolder.java:178)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Caused by: java.lang.ClassNotFoundException:
org.apache.commons.configuration.Configuration
at
org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:506)
at
org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:422)
at
org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:410)
at
org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 21 more

!ENTRY org.eclipse.jface 4 0 2012-01-03 02:47:50.812
!MESSAGE The command ("dfs.browser.action.download") is undefined
!STACK 0
java.lang.Exception
at
org.eclipse.jface.action.ExternalActionManager$CommandCallback.isActive(ExternalActionManager.java:370)
at
org.eclipse.jface.action.ActionContributionItem.isCommandActive(ActionContributionItem.java:647)
at
org.eclipse.jface.action.ActionContributionItem.isVisible(ActionContributionItem.java:703)
at
org.eclipse.jface.action.MenuManager.isChildVisible(MenuManager.java:985)
at org.eclipse.jface.action.MenuManager.update(MenuManager.java:759)
at
org.eclipse.jface.action.MenuManager.handleAboutToShow(MenuManager.java:470)
at org.eclipse.jface.action.MenuManager.access$1(MenuManager.java:465)
at
org.eclipse.jface.action.MenuManager$2.menuShown(MenuManager.java:491)
at
org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:241)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1053)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1077)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1058)
at org.eclipse.swt.widgets.Control.WM_INITMENUPOPUP(Control.java:4487)
at org.eclipse.swt.widgets.Control.windowProc(Control.java:4190)
at org.eclipse.swt.widgets.Canvas.windowProc(Canvas.java:341)
at org.eclipse.swt.widgets.Decorations.windowProc(Decorations.java:1598)
at org.eclipse.swt.widgets.Shell.windowProc(Shell.java:2038)
at org.eclipse.sw

Re: Handling bad records

2012-02-27 Thread Mohit Anchlia

Thanks that's helpful. In that example what is "A" and "B" referring to? Is
that the output file name?

mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye"));
mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau"));


On Mon, Feb 27, 2012 at 9:53 PM, Harsh J  wrote:

> Mohit,
>
> Use the MultipleOutputs API:
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
> to have a named output of bad records. There is an example of use
> detailed on the link.
>
> On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia 
> wrote:
> > What's the best way to write records to a different file? I am doing xml
> > processing and during processing I might come accross invalid xml format.
> > Current I have it under try catch block and writing to log4j. But I think
> > it would be better to just write it to an output file that just contains
> > errors.
>
>
>
> --
> Harsh J
>

Re: Handling bad records

2012-02-27 Thread Harsh J

Mohit,

Use the MultipleOutputs API:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
to have a named output of bad records. There is an example of use
detailed on the link.

On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia  wrote:
> What's the best way to write records to a different file? I am doing xml
> processing and during processing I might come accross invalid xml format.
> Current I have it under try catch block and writing to log4j. But I think
> it would be better to just write it to an output file that just contains
> errors.

-- 
Harsh J

Re: Invocation exception

2012-02-27 Thread Mohit Anchlia

On Mon, Feb 27, 2012 at 8:58 PM, Prashant Kommireddi wrote:

> Tom White's Definitive Guide book is a great reference. Answers to
> most of your questions could be found there.
>
> I've been through that book but haven't come accross how to debug this
exception. Can you point me to the topic in that book where I'll find this
information?


> Sent from my iPhone
>
> On Feb 27, 2012, at 8:54 PM, Mohit Anchlia  wrote:
>
> > Does it matter if reducer is set even if the no of reducers is 0? Is
> there
> > a way to get more clear reason?
> >
> > On Mon, Feb 27, 2012 at 8:23 PM, Subir S 
> wrote:
> >
> >> On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia  >>> wrote:
> >>
> >>> For some reason I am getting invocation exception and I don't see any
> >> more
> >>> details other than this exception:
> >>>
> >>> My job is configured as:
> >>>
> >>>
> >>> JobConf conf = *new* JobConf(FormMLProcessor.*class*);
> >>>
> >>> conf.addResource("hdfs-site.xml");
> >>>
> >>> conf.addResource("core-site.xml");
> >>>
> >>> conf.addResource("mapred-site.xml");
> >>>
> >>> conf.set("mapred.reduce.tasks", "0");
> >>>
> >>> conf.setJobName("mlprocessor");
> >>>
> >>> DistributedCache.*addFileToClassPath*(*new*
> Path("/jars/analytics.jar"),
> >>> conf);
> >>>
> >>> DistributedCache.*addFileToClassPath*(*new* Path("/jars/common.jar"),
> >>> conf);
> >>>
> >>> conf.setOutputKeyClass(Text.*class*);
> >>>
> >>> conf.setOutputValueClass(Text.*class*);
> >>>
> >>> conf.setMapperClass(Map.*class*);
> >>>
> >>> conf.setCombinerClass(Reduce.*class*);
> >>>
> >>> conf.setReducerClass(IdentityReducer.*class*);
> >>>
> >>
> >> Why would you set the Reducer when the number of reducers is set to
> zero.
> >> Not sure if this is the real cause.
> >>
> >>
> >>>
> >>> conf.setInputFormat(SequenceFileAsTextInputFormat.*class*);
> >>>
> >>> conf.setOutputFormat(TextOutputFormat.*class*);
> >>>
> >>> FileInputFormat.*setInputPaths*(conf, *new* Path(args[0]));
> >>>
> >>> FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1]));
> >>>
> >>> JobClient.*runJob*(conf);
> >>>
> >>> -
> >>> *
> >>>
> >>> java.lang.RuntimeException*: Error in configuring object
> >>>
> >>> at org.apache.hadoop.util.ReflectionUtils.setJobConf(*
> >>> ReflectionUtils.java:93*)
> >>>
> >>> at
> >>>
> org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*)
> >>>
> >>> at org.apache.hadoop.util.ReflectionUtils.newInstance(*
> >>> ReflectionUtils.java:117*)
> >>>
> >>> at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*)
> >>>
> >>> at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*)
> >>>
> >>> at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*)
> >>>
> >>> at java.security.AccessController.doPrivileged(*Native Method*)
> >>>
> >>> at javax.security.auth.Subject.doAs(*Subject.java:396*)
> >>>
> >>> at org.apache.hadoop.security.UserGroupInformation.doAs(*
> >>> UserGroupInformation.java:1157*)
> >>>
> >>> at org.apache.hadoop.mapred.Child.main(*Child.java:264*)
> >>>
> >>> Caused by: *java.lang.reflect.InvocationTargetException
> >>> *
> >>>
> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*)
> >>>
> >>> at sun.reflect.NativeMethodAccessorImpl.invoke(*
> >>> NativeMethodAccessorImpl.java:39*)
> >>>
> >>> at
> >>>
> >>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> >>>
> >>
>

Re: Invocation exception

2012-02-27 Thread Prashant Kommireddi

Tom White's Definitive Guide book is a great reference. Answers to
most of your questions could be found there.

Sent from my iPhone

On Feb 27, 2012, at 8:54 PM, Mohit Anchlia  wrote:

> Does it matter if reducer is set even if the no of reducers is 0? Is there
> a way to get more clear reason?
>
> On Mon, Feb 27, 2012 at 8:23 PM, Subir S  wrote:
>
>> On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia >> wrote:
>>
>>> For some reason I am getting invocation exception and I don't see any
>> more
>>> details other than this exception:
>>>
>>> My job is configured as:
>>>
>>>
>>> JobConf conf = *new* JobConf(FormMLProcessor.*class*);
>>>
>>> conf.addResource("hdfs-site.xml");
>>>
>>> conf.addResource("core-site.xml");
>>>
>>> conf.addResource("mapred-site.xml");
>>>
>>> conf.set("mapred.reduce.tasks", "0");
>>>
>>> conf.setJobName("mlprocessor");
>>>
>>> DistributedCache.*addFileToClassPath*(*new* Path("/jars/analytics.jar"),
>>> conf);
>>>
>>> DistributedCache.*addFileToClassPath*(*new* Path("/jars/common.jar"),
>>> conf);
>>>
>>> conf.setOutputKeyClass(Text.*class*);
>>>
>>> conf.setOutputValueClass(Text.*class*);
>>>
>>> conf.setMapperClass(Map.*class*);
>>>
>>> conf.setCombinerClass(Reduce.*class*);
>>>
>>> conf.setReducerClass(IdentityReducer.*class*);
>>>
>>
>> Why would you set the Reducer when the number of reducers is set to zero.
>> Not sure if this is the real cause.
>>
>>
>>>
>>> conf.setInputFormat(SequenceFileAsTextInputFormat.*class*);
>>>
>>> conf.setOutputFormat(TextOutputFormat.*class*);
>>>
>>> FileInputFormat.*setInputPaths*(conf, *new* Path(args[0]));
>>>
>>> FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1]));
>>>
>>> JobClient.*runJob*(conf);
>>>
>>> -
>>> *
>>>
>>> java.lang.RuntimeException*: Error in configuring object
>>>
>>> at org.apache.hadoop.util.ReflectionUtils.setJobConf(*
>>> ReflectionUtils.java:93*)
>>>
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*)
>>>
>>> at org.apache.hadoop.util.ReflectionUtils.newInstance(*
>>> ReflectionUtils.java:117*)
>>>
>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*)
>>>
>>> at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*)
>>>
>>> at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*)
>>>
>>> at java.security.AccessController.doPrivileged(*Native Method*)
>>>
>>> at javax.security.auth.Subject.doAs(*Subject.java:396*)
>>>
>>> at org.apache.hadoop.security.UserGroupInformation.doAs(*
>>> UserGroupInformation.java:1157*)
>>>
>>> at org.apache.hadoop.mapred.Child.main(*Child.java:264*)
>>>
>>> Caused by: *java.lang.reflect.InvocationTargetException
>>> *
>>>
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*)
>>>
>>> at sun.reflect.NativeMethodAccessorImpl.invoke(*
>>> NativeMethodAccessorImpl.java:39*)
>>>
>>> at
>>>
>>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
>>>
>>

Re: Invocation exception

2012-02-27 Thread Mohit Anchlia

Does it matter if reducer is set even if the no of reducers is 0? Is there
a way to get more clear reason?

On Mon, Feb 27, 2012 at 8:23 PM, Subir S  wrote:

> On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia  >wrote:
>
> > For some reason I am getting invocation exception and I don't see any
> more
> > details other than this exception:
> >
> > My job is configured as:
> >
> >
> > JobConf conf = *new* JobConf(FormMLProcessor.*class*);
> >
> > conf.addResource("hdfs-site.xml");
> >
> > conf.addResource("core-site.xml");
> >
> > conf.addResource("mapred-site.xml");
> >
> > conf.set("mapred.reduce.tasks", "0");
> >
> > conf.setJobName("mlprocessor");
> >
> > DistributedCache.*addFileToClassPath*(*new* Path("/jars/analytics.jar"),
> > conf);
> >
> > DistributedCache.*addFileToClassPath*(*new* Path("/jars/common.jar"),
> > conf);
> >
> > conf.setOutputKeyClass(Text.*class*);
> >
> > conf.setOutputValueClass(Text.*class*);
> >
> > conf.setMapperClass(Map.*class*);
> >
> > conf.setCombinerClass(Reduce.*class*);
> >
> > conf.setReducerClass(IdentityReducer.*class*);
> >
>
> Why would you set the Reducer when the number of reducers is set to zero.
> Not sure if this is the real cause.
>
>
> >
> > conf.setInputFormat(SequenceFileAsTextInputFormat.*class*);
> >
> > conf.setOutputFormat(TextOutputFormat.*class*);
> >
> > FileInputFormat.*setInputPaths*(conf, *new* Path(args[0]));
> >
> > FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1]));
> >
> > JobClient.*runJob*(conf);
> >
> > -
> > *
> >
> > java.lang.RuntimeException*: Error in configuring object
> >
> > at org.apache.hadoop.util.ReflectionUtils.setJobConf(*
> > ReflectionUtils.java:93*)
> >
> > at
> > org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*)
> >
> > at org.apache.hadoop.util.ReflectionUtils.newInstance(*
> > ReflectionUtils.java:117*)
> >
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*)
> >
> > at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*)
> >
> > at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*)
> >
> > at java.security.AccessController.doPrivileged(*Native Method*)
> >
> > at javax.security.auth.Subject.doAs(*Subject.java:396*)
> >
> > at org.apache.hadoop.security.UserGroupInformation.doAs(*
> > UserGroupInformation.java:1157*)
> >
> > at org.apache.hadoop.mapred.Child.main(*Child.java:264*)
> >
> > Caused by: *java.lang.reflect.InvocationTargetException
> > *
> >
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*)
> >
> > at sun.reflect.NativeMethodAccessorImpl.invoke(*
> > NativeMethodAccessorImpl.java:39*)
> >
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
> >
>

Re: Invocation exception

2012-02-27 Thread Subir S

On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia wrote:

> For some reason I am getting invocation exception and I don't see any more
> details other than this exception:
>
> My job is configured as:
>
>
> JobConf conf = *new* JobConf(FormMLProcessor.*class*);
>
> conf.addResource("hdfs-site.xml");
>
> conf.addResource("core-site.xml");
>
> conf.addResource("mapred-site.xml");
>
> conf.set("mapred.reduce.tasks", "0");
>
> conf.setJobName("mlprocessor");
>
> DistributedCache.*addFileToClassPath*(*new* Path("/jars/analytics.jar"),
> conf);
>
> DistributedCache.*addFileToClassPath*(*new* Path("/jars/common.jar"),
> conf);
>
> conf.setOutputKeyClass(Text.*class*);
>
> conf.setOutputValueClass(Text.*class*);
>
> conf.setMapperClass(Map.*class*);
>
> conf.setCombinerClass(Reduce.*class*);
>
> conf.setReducerClass(IdentityReducer.*class*);
>

Why would you set the Reducer when the number of reducers is set to zero.
Not sure if this is the real cause.


>
> conf.setInputFormat(SequenceFileAsTextInputFormat.*class*);
>
> conf.setOutputFormat(TextOutputFormat.*class*);
>
> FileInputFormat.*setInputPaths*(conf, *new* Path(args[0]));
>
> FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1]));
>
> JobClient.*runJob*(conf);
>
> -
> *
>
> java.lang.RuntimeException*: Error in configuring object
>
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(*
> ReflectionUtils.java:93*)
>
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*)
>
> at org.apache.hadoop.util.ReflectionUtils.newInstance(*
> ReflectionUtils.java:117*)
>
> at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*)
>
> at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*)
>
> at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*)
>
> at java.security.AccessController.doPrivileged(*Native Method*)
>
> at javax.security.auth.Subject.doAs(*Subject.java:396*)
>
> at org.apache.hadoop.security.UserGroupInformation.doAs(*
> UserGroupInformation.java:1157*)
>
> at org.apache.hadoop.mapred.Child.main(*Child.java:264*)
>
> Caused by: *java.lang.reflect.InvocationTargetException
> *
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(*
> NativeMethodAccessorImpl.java:39*)
>
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
>

RE: jobtracker always say 'tip is null'

2012-02-27 Thread Li, Yonggang

Hi Harsh,
I have tried to install hadoop1.0 in hp-ux but fail to run it. because The 
shell of hp-ux and Linux  syntax is slightly different.


Best Regards

Yonggang Li


-Original Message-
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: Monday, February 27, 2012 8:01 PM
To: common-user@hadoop.apache.org
Subject: Re: jobtracker always say 'tip is null'

Hi Yonggang,

Unfortunately you're using a very old version, so its hard to tell
what was wrong with it.

Could you please try upgrading the the most recent stable release
(1.0.x)? We've not seen this issue come up in the last couple of
years, so it may have been a bug fixed quite some time ago.

On Mon, Feb 27, 2012 at 1:47 PM, Li, Yonggang  wrote:
> Hi All,
> I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker 
> always say :
> Tip is null
> Serious problem.  While updating status, cannot find tasked
>
> Below is jobtrack log:
> 2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: 
> oldState is RUNNING,newState is RUNNING
> 2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
> is 1, newStatus is 1
> 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is 
> org.apache.hadoop.mapred.TaskInProgress@3bf9ff
> 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
> oldState is RUNNING,newState is KILLED
> 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is 
> KILLED
> 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
> shouldFail is null
> 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
> is 1, newStatus is 1
> 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471'
> 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is 
> org.apache.hadoop.mapred.TaskInProgress@a11b29
> 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is 
> SUCCEEDED
> 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task 
> 'attempt_20120223171354_20120224185829_0019_m_04_0' ha
> s completed task_20120223171354_20120224185829_0019_m_04 successfully.
> 2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job 
> with id: 'job_20120223171354_20120224160112_0006' of u
> ser: 'ecip'
> 2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
> is 1, newStatus is 3
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
> 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
> 2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
> 20

Re: Bypassing reducer

2012-02-27 Thread Serge Blazhievsky

Try setting numbers of the reducers to 0.


On 2/27/12 2:34 PM, "Mohit Anchlia"  wrote:

>Is there a way to completely bypass reduce step? Pig is able to do it but
>it doesn't work for me in map reduce program even though I've commented
>setReducerClass

Handling bad records

2012-02-27 Thread Mohit Anchlia

What's the best way to write records to a different file? I am doing xml
processing and during processing I might come accross invalid xml format.
Current I have it under try catch block and writing to log4j. But I think
it would be better to just write it to an output file that just contains
errors.

Re: dfs.block.size

2012-02-27 Thread Kai Voigt

"hadoop fsck  -blocks" is something that I think of quickly.

http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck has more 
details

Kai

Am 28.02.2012 um 02:30 schrieb Mohit Anchlia:

> How do I verify the block size of a given file? Is there a command?
> 
> On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria  wrote:
> 
>> dfs.block.size can be set per job.
>> 
>> mapred.tasktracker.map.tasks.maximum is per tasktracker.
>> 
>> -Joey
>> 
>> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia 
>> wrote:
>>> Can someone please suggest if parameters like dfs.block.size,
>>> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
>> can
>>> these be set per client job configuration?
>>> 
>>> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia >> wrote:
>>> 
 If I want to change the block size then can I use Configuration in
 mapreduce job and set it when writing to the sequence file or does it
>> need
 to be cluster wide setting in .xml files?
 
 Also, is there a way to check the block of a given file?
 
>> 
>> 
>> 
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>> 

-- 
Kai Voigt
k...@123.org

Re: dfs.block.size

2012-02-27 Thread Mohit Anchlia

How do I verify the block size of a given file? Is there a command?

On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria  wrote:

> dfs.block.size can be set per job.
>
> mapred.tasktracker.map.tasks.maximum is per tasktracker.
>
> -Joey
>
> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia 
> wrote:
> > Can someone please suggest if parameters like dfs.block.size,
> > mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
> can
> > these be set per client job configuration?
> >
> > On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia  >wrote:
> >
> >> If I want to change the block size then can I use Configuration in
> >> mapreduce job and set it when writing to the sequence file or does it
> need
> >> to be cluster wide setting in .xml files?
> >>
> >> Also, is there a way to check the block of a given file?
> >>
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: Task Killed but no errors

2012-02-27 Thread Shi Yu


On 2/27/2012 1:55 PM, Mohit Anchlia wrote:

I submitted a map reduce job that had 9 tasks killed out of 139. But I
don't see any errors in the admin page. The entire job however has
SUCCEDED. How can I track down the reason?

Also, how do I determine if this is something to worry about?


Hi,

You should go to the data nodes and check /logs/userlogs/ directories 
there.


Though I am not either very clear about this case:  If you are working 
on the administrated cluster and you don't have access to the data 
nodes, how to check the error logs.  For me I have to ask administrator 
forward me the error log sometimes.  The logs in jobtracker and namenode 
are very limited.  And the datanode info in admin web page is blocked 
because of security reasons? I didn't follow up the new releases about 
this issue.  Maybe in 1.0.x they have improved methods about this?


Shi

Re: Task Killed but no errors

2012-02-27 Thread Serge Blazhievsky

You probably have speculative execution enabled. That¹s normal for job
tracker to launch multiple tasks and take result of the ones that
complited first. 

Regards,
Serge 

On 2/27/12 11:55 AM, "Mohit Anchlia"  wrote:

>I submitted a map reduce job that had 9 tasks killed out of 139. But I
>don't see any errors in the admin page. The entire job however has
>SUCCEDED. How can I track down the reason?
>
>Also, how do I determine if this is something to worry about?

Task Killed but no errors

2012-02-27 Thread Mohit Anchlia

I submitted a map reduce job that had 9 tasks killed out of 139. But I
don't see any errors in the admin page. The entire job however has
SUCCEDED. How can I track down the reason?

Also, how do I determine if this is something to worry about?

Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Keith Wiley

Seconded, I've setup and run Hadoop CDH3 on a recent 10.7(.2) Mac. Works like a 
charm.

Sent from my phone, please excuse my brevity.
Keith Wiley, kwi...@keithwiley.com, http://keithwiley.com

Serge Blazhievsky  wrote:

Hi

I have detailed instructions online here:

http://hadoopway.blogspot.com/

It works on MAC and all software is open source.

Serge

On 2/26/12 8:28 PM, "Sriram Ganesan"  wrote:

>Hello All,
>
>I am a beginning hadoop user. I am trying to install hadoop as part of a
>single-node setup. I read in the documentation that the supported
>platforms
>are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
>setup. I am guessing I need to use some virtualization solution like
>VirtualBox
>to run Linux. If anyone has a better way of running hadoop on a mac,
>please
>kindly share your experiences. If this question is not appropriate for
>this
>mailing list, I apologize and please kindly let me know what is the best
>mailing list to post this question.
>
>Thanks
>Sriram

Re: Can't build hadoop-1.0.1 -- Break building fuse-dfs

2012-02-27 Thread Kumar Ravi


Hello,

I found a work around for this problem

 -- The libhdfs files were elsewhere in the build in $HADOOP_HOME/build/c+
+/Linux-amd64-64/lib/ and not in the $HADOOP_HOME/build/libhdfs directory
as the Makefile in fuse-dfs were pointing to.

Regards,
Kumar

Kumar Ravi



   
  From:   Kumar Ravi/Austin/IBM@IBMUS  
   
  To: common-user@hadoop.apache.org
   
  Date:   02/27/2012 10:22 AM  
   
  Subject:Can't build hadoop-1.0.1 -- Break building fuse-dfs  
   






Hello,

 I am running into the following problem building hadoop-1.0.1:


-
 [exec] make[1]: Entering directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs'
 [exec] make[1]: Nothing to be done for `all-am'.
 [exec] make[1]: Leaving directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs'
 [exec] Making all in src
 [exec] make[1]: Entering directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs/src'
 [exec] gcc  -Wall -O3 -L/home/kumar/hadoop-1.0.1/build/libhdfs -lhdfs
-L/lib -lfuse -L/usr/java/jdk1.6.0_27//jre/lib/amd64/server -ljvm  -o
fuse_dfs fuse_dfs.o fuse_options.o fuse_trash.o fuse_stat_struct.o
fuse_users.o fuse_init.o fuse_connect.o fuse_impls_access.o
fuse_impls_chmod.o fuse_impls_chown.o fuse_impls_create.o
fuse_impls_flush.o fuse_impls_getattr.o fuse_impls_mkdir.o
fuse_impls_mknod.o fuse_impls_open.o fuse_impls_read.o fuse_impls_release.o
fuse_impls_readdir.o fuse_impls_rename.o fuse_impls_rmdir.o
fuse_impls_statfs.o fuse_impls_symlink.o fuse_impls_truncate.o
fuse_impls_utimens.o fuse_impls_unlink.o fuse_impls_write.o
 [exec] /usr/bin/ld: cannot find -lhdfs
 [exec] collect2: ld returned 1 exit status

---

Src. was downloaded from --
http://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.1/ using
svn, and the ant command with target used was:

ant -Dlibhdfs=true -Dcompile.native=true -Dfusedfs=true -Dcompile.c++=true
-Dforrest.home=/apache-forrest-0.8/ compile-core-native compile-c++
compile-c++-examples task-controller tar record-parser compile-hdfs-classes
package -Djava5.home=/opt/sun/jdk1.5.0_22/


I am using Sun Java JDK 1.6.0_31 -

java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)


I would appreciate any pointers to getting past this problem.




Kumar Ravi

Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Jamack, Peter

You could also use vmware Fusion on a MacŠ I do this when I'm creating a
distributed hadoop cluster with a few data nodes, but just for a single
node,  you can install that on a Mac OSX, no need for virtualization.

Peter J

On 2/26/12 8:28 PM, "Sriram Ganesan"  wrote:

>Hello All,
>
>I am a beginning hadoop user. I am trying to install hadoop as part of a
>single-node setup. I read in the documentation that the supported
>platforms
>are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
>setup. I am guessing I need to use some virtualization solution like
>VirtualBox
>to run Linux. If anyone has a better way of running hadoop on a mac,
>please
>kindly share your experiences. If this question is not appropriate for
>this
>mailing list, I apologize and please kindly let me know what is the best
>mailing list to post this question.
>
>Thanks
>Sriram

Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Art Ignacio

Good to know about the VirtualBox instructions.

Here are a couple of other links that might help on single node:

Single Node Setup
http://hadoop.apache.org/common/docs/stable/single_node_setup.html

Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster)
http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster)

Art Ignacio
hortonworks.com

On Mon, Feb 27, 2012 at 8:49 AM, Serge Blazhievsky <
serge.blazhiyevs...@nice.com> wrote:

> Hi
>
> I have detailed instructions online here:
>
> http://hadoopway.blogspot.com/
>
>
> It works on MAC and all software is open source.
>
> Serge
>
> On 2/26/12 8:28 PM, "Sriram Ganesan"  wrote:
>
> >Hello All,
> >
> >I am a beginning hadoop user. I am trying to install hadoop as part of a
> >single-node setup. I read in the documentation that the supported
> >platforms
> >are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
> >setup. I am guessing I need to use some virtualization solution like
> >VirtualBox
> >to run Linux. If anyone has a better way of running hadoop on a mac,
> >please
> >kindly share your experiences. If this question is not appropriate for
> >this
> >mailing list, I apologize and please kindly let me know what is the best
> >mailing list to post this question.
> >
> >Thanks
> >Sriram
>
>

Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread W.P. McNeill

You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is.

Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Serge Blazhievsky

Hi 

I have detailed instructions online here:

http://hadoopway.blogspot.com/


It works on MAC and all software is open source.

Serge

On 2/26/12 8:28 PM, "Sriram Ganesan"  wrote:

>Hello All,
>
>I am a beginning hadoop user. I am trying to install hadoop as part of a
>single-node setup. I read in the documentation that the supported
>platforms
>are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
>setup. I am guessing I need to use some virtualization solution like
>VirtualBox
>to run Linux. If anyone has a better way of running hadoop on a mac,
>please
>kindly share your experiences. If this question is not appropriate for
>this
>mailing list, I apologize and please kindly let me know what is the best
>mailing list to post this question.
>
>Thanks
>Sriram

Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Sriram Ganesan

Hello All,

I am a beginning hadoop user. I am trying to install hadoop as part of a
single-node setup. I read in the documentation that the supported platforms
are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
setup. I am guessing I need to use some virtualization solution like
VirtualBox
to run Linux. If anyone has a better way of running hadoop on a mac, please
kindly share your experiences. If this question is not appropriate for this
mailing list, I apologize and please kindly let me know what is the best
mailing list to post this question.

Thanks
Sriram

Can't build hadoop-1.0.1 -- Break building fuse-dfs

2012-02-27 Thread Kumar Ravi



Hello,

 I am running into the following problem building hadoop-1.0.1:


-
 [exec] make[1]: Entering directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs'
 [exec] make[1]: Nothing to be done for `all-am'.
 [exec] make[1]: Leaving directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs'
 [exec] Making all in src
 [exec] make[1]: Entering directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs/src'
 [exec] gcc  -Wall -O3 -L/home/kumar/hadoop-1.0.1/build/libhdfs -lhdfs
-L/lib -lfuse -L/usr/java/jdk1.6.0_27//jre/lib/amd64/server -ljvm  -o
fuse_dfs fuse_dfs.o fuse_options.o fuse_trash.o fuse_stat_struct.o
fuse_users.o fuse_init.o fuse_connect.o fuse_impls_access.o
fuse_impls_chmod.o fuse_impls_chown.o fuse_impls_create.o
fuse_impls_flush.o fuse_impls_getattr.o fuse_impls_mkdir.o
fuse_impls_mknod.o fuse_impls_open.o fuse_impls_read.o fuse_impls_release.o
fuse_impls_readdir.o fuse_impls_rename.o fuse_impls_rmdir.o
fuse_impls_statfs.o fuse_impls_symlink.o fuse_impls_truncate.o
fuse_impls_utimens.o fuse_impls_unlink.o fuse_impls_write.o
 [exec] /usr/bin/ld: cannot find -lhdfs
 [exec] collect2: ld returned 1 exit status

---

Src. was downloaded from --
http://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.1/ using
svn, and the ant command with target used was:

ant -Dlibhdfs=true -Dcompile.native=true -Dfusedfs=true -Dcompile.c++=true
-Dforrest.home=/apache-forrest-0.8/ compile-core-native compile-c++
compile-c++-examples task-controller tar record-parser compile-hdfs-classes
package -Djava5.home=/opt/sun/jdk1.5.0_22/


I am using Sun Java JDK 1.6.0_31 -

java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)


I would appreciate any pointers to getting past this problem.




Kumar Ravi

Re: dfs.block.size

2012-02-27 Thread Joey Echeverria

dfs.block.size can be set per job.

mapred.tasktracker.map.tasks.maximum is per tasktracker.

-Joey

On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia  wrote:
> Can someone please suggest if parameters like dfs.block.size,
> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
> these be set per client job configuration?
>
> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia wrote:
>
>> If I want to change the block size then can I use Configuration in
>> mapreduce job and set it when writing to the sequence file or does it need
>> to be cluster wide setting in .xml files?
>>
>> Also, is there a way to check the block of a given file?
>>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: dfs.block.size

2012-02-27 Thread Mohit Anchlia

Can someone please suggest if parameters like dfs.block.size,
mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
these be set per client job configuration?

On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia wrote:

> If I want to change the block size then can I use Configuration in
> mapreduce job and set it when writing to the sequence file or does it need
> to be cluster wide setting in .xml files?
>
> Also, is there a way to check the block of a given file?
>

Re: jobtracker always say 'tip is null'

2012-02-27 Thread Harsh J

Hi Yonggang,

Unfortunately you're using a very old version, so its hard to tell
what was wrong with it.

Could you please try upgrading the the most recent stable release
(1.0.x)? We've not seen this issue come up in the last couple of
years, so it may have been a bug fixed quite some time ago.

On Mon, Feb 27, 2012 at 1:47 PM, Li, Yonggang  wrote:
> Hi All,
> I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker 
> always say :
> Tip is null
> Serious problem.  While updating status, cannot find tasked
>
> Below is jobtrack log:
> 2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: 
> oldState is RUNNING,newState is RUNNING
> 2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
> is 1, newStatus is 1
> 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is 
> org.apache.hadoop.mapred.TaskInProgress@3bf9ff
> 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
> oldState is RUNNING,newState is KILLED
> 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is 
> KILLED
> 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
> shouldFail is null
> 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
> is 1, newStatus is 1
> 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471'
> 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is 
> org.apache.hadoop.mapred.TaskInProgress@a11b29
> 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is 
> SUCCEEDED
> 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task 
> 'attempt_20120223171354_20120224185829_0019_m_04_0' ha
> s completed task_20120223171354_20120224185829_0019_m_04 successfully.
> 2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job 
> with id: 'job_20120223171354_20120224160112_0006' of u
> ser: 'ecip'
> 2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
> is 1, newStatus is 3
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244'
> 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
> 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
> 2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
> 2012-02-24 19:20:47,312 INFO org.apache.hadoop.mapred.JobTracker: tip is null
> 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_20120223171354_20120224185829_0019
> _m_03_0' from 'tracker_psns200n:localhost/127.0.0.1:56471'
> 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> complete

RE: BZip2 Splittable?

2012-02-27 Thread Daniel Baptista

Thanks to everyone with their help on this. 

We are currently using pig, but I don't think that this is something that we 
are currently using, I will pass this recommendation on!

Thanks again, Dan.

-Original Message-
From: Srinivas Surasani [mailto:hivehadooplearn...@gmail.com] 
Sent: 24 February 2012 21:08
To: common-user@hadoop.apache.org
Subject: Re: BZip2 Splittable?

@Daniel,

If you want to process bz2 files in parallel( more than one mapper/reducer
), you can go for Pig.

See below.

Pig has inbuilt support for processing .bz2 files in parallel (.gz support
is coming soon). If the input file name extension is .bz2, Pig decompresses
the file on the fly and passes the decompressed input stream to your load
function.

Regards,


On Fri, Feb 24, 2012 at 2:59 PM, Rohit  wrote:

> Hi Daniel,
>
> Because your MapReduce jobs will not split bzip2 files, each entire bzip2
> file will be processed by one Map task. Thus, if your job takes multiple
> bzip2 text files as the input, then you'll have as many Map tasks as you
> have files running in parallel.
>
> The Map tasks will be run by your TaskTrackers. Usually the cluster setup
> has the DataNode and the TaskTracker processing running on the same
> machines - so with 6 data nodes, you have 6 tasktrackers.
>
> Hope that answers your question.
>
>
> Rohit Bakhshi
>
>
>
> www.hortonworks.com (http://www.hortonworks.com/)
>
>
>
> On Friday, February 24, 2012 at 7:59 AM, Daniel Baptista wrote:
> > Hi Rohit, thanks for the response, this is pretty much as I expected and
> hopefully adds weight to my other thoughts...
> >
> > Could this mean that all my datanodes are being sent all of the data or
> that only one datanode is executing the job.
> >
> > Thanks again , Dan.
> >
> > -Original Message-
> > From: Rohit Bakhshi [mailto:ro...@hortonworks.com]
> > Sent: 24 February 2012 15:54
> > To: common-user@hadoop.apache.org (mailto:common-user@hadoop.apache.org)
> > Subject: Re: BZip2 Splittable?
> >
> > Daniel,
> >
> > I just noticed your Hadoop version - 0.20.2.
> >
> > The JIRA fix below is for Hadoop 0.21.0, which is a different version.
> So it may not be supported on your version of Hadoop.
> >
> > --
> > Rohit Bakhshi
> > www.hortonworks.com (http://www.hortonworks.com/)
> >
> >
> >
> >
> > On Friday, February 24, 2012 at 7:49 AM, Rohit Bakhshi wrote:
> >
> > > Hi Daniel,
> > >
> > > Bzip2 compression codec allows for splittable files.
> > >
> > > According to this Hadoop JIRA improvement, splitting of bzip2
> compressed files in Hadoop jobs is supported:
> > > https://issues.apache.org/jira/browse/HADOOP-4012
> > >
> > > --
> > > Rohit Bakhshi
> > > www.hortonworks.com (http://www.hortonworks.com/)
> > >
> > >
> > >
> > >
> > > On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have a cluster of 6 datanodes, all running hadoop version 0.20.2,
> r911707 that take a series of bzip2 compressed text files as input.
> > > >
> > > > I have read conflicting articles regarding whether or not hadoop can
> split these bzip2 files, can anyone give me a definite answer?
> > > >
> > > > Thanks is advance, Dan.
> >
> >
> > 
> >
> > CONFIDENTIALITY - This email and any files transmitted with it, are
> confidential, may be legally privileged and are intended solely for the use
> of the individual or entity to whom they are addressed. If this has come to
> you in error, you must not copy, distribute, disclose or use any of the
> information it contains. Please notify the sender immediately and delete
> them from your system.
> >
> > SECURITY - Please be aware that communication by email, by its very
> nature, is not 100% secure and by communicating with Perform Group by email
> you consent to us monitoring and reading any such correspondence.
> >
> > VIRUSES - Although this email message has been scanned for the presence
> of computer viruses, the sender accepts no liability for any damage
> sustained as a result of a computer virus and it is the recipient's
> responsibility to ensure that email is virus free.
> >
> > AUTHORITY - Any views or opinions expressed in this email are solely
> those of the sender and do not necessarily represent those of Perform Group.
> >
> > COPYRIGHT - Copyright of this email and any attachments belongs to
> Perform Group, Companies House Registration number 6324278.
>
>


-- 
Regards,
-- Srinivas
srini...@cloudwick.com

jobtracker always say 'tip is null'

2012-02-27 Thread Li, Yonggang

Hi All,
I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker 
always say :
Tip is null
Serious problem.  While updating status, cannot find tasked

Below is jobtrack log:
2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: oldState 
is RUNNING,newState is RUNNING
2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 
1, newStatus is 1
2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is 
org.apache.hadoop.mapred.TaskInProgress@3bf9ff
2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: oldState 
is RUNNING,newState is KILLED
2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is 
KILLED
2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
shouldFail is null
2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 
1, newStatus is 1
2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471'
2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is 
org.apache.hadoop.mapred.TaskInProgress@a11b29
2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is 
SUCCEEDED
2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task 
'attempt_20120223171354_20120224185829_0019_m_04_0' ha
s completed task_20120223171354_20120224185829_0019_m_04 successfully.
2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job 
with id: 'job_20120223171354_20120224160112_0006' of u
ser: 'ecip'
2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 
1, newStatus is 3
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
2012-02-24 19:20:47,312 INFO org.apache.hadoop.mapred.JobTracker: tip is null
2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_03_0' from 'tracker_psns200n:localhost/127.0.0.1:56471'
2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_01_0' from 'tracker_psns200n:localhost/127.0.0.1:56471'
2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_01_1' from 'tracker_psns200n:localhost/127.0.0.1:56471'
2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_01_2' from 'tracker_psns200n:localhost/127.0.0.1:56471'

hadoop streaming : need help in using custom key value separator

Need help on hadoop eclipse plugin

Re: Handling bad records

Re: Handling bad records

Re: Invocation exception

Re: Invocation exception

Re: Invocation exception

Re: Invocation exception

RE: jobtracker always say 'tip is null'

Re: Bypassing reducer

Handling bad records

Re: dfs.block.size

Re: dfs.block.size

Re: Task Killed but no errors

Re: Task Killed but no errors

Task Killed but no errors

Re: Setting up Hadoop single node setup on Mac OS X

Re: Can't build hadoop-1.0.1 -- Break building fuse-dfs

Re: Setting up Hadoop single node setup on Mac OS X

Re: Setting up Hadoop single node setup on Mac OS X

Re: Setting up Hadoop single node setup on Mac OS X

Re: Setting up Hadoop single node setup on Mac OS X

Setting up Hadoop single node setup on Mac OS X

Can't build hadoop-1.0.1 -- Break building fuse-dfs

Re: dfs.block.size

Re: dfs.block.size

Re: jobtracker always say 'tip is null'

RE: BZip2 Splittable?

jobtracker always say 'tip is null'

29 matches

Site Navigation

Mail list logo

Footer information