date:20120524

RE: Right way to implement MR ?

2012-05-24 Thread Shailesh Dargude

Samir,
  
Depends upon your data format and what you want to achieve. More 
information could have helped

-Shailesh.

-Original Message-
From: samir das mohapatra [mailto:samir.help...@gmail.com] 
Sent: Thursday, May 24, 2012 1:17 AM
To: common-user@hadoop.apache.org
Subject: Right way to implement MR ?

Hi All,
 How to compare to input file In M/R Job.
 let A Log file around 30GB
and B Log file size is around 60 GB

  I wanted to know how  i will  define  inside the mapper.

 Thanks
  samir.

Re: Right way to implement MR ?

2012-05-24 Thread samir das mohapatra

Thanks
  Harsh J for your help.

On Thu, May 24, 2012 at 1:24 AM, Harsh J  wrote:

> Samir,
>
> You can use MultipleInputs for multiple forms of inputs per mapper
> (with their own input K/V types, but common output K/V types) with a
> common reduce-side join/compare.
>
> See
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
> .
>
> On Thu, May 24, 2012 at 1:17 AM, samir das mohapatra
>  wrote:
> > Hi All,
> > How to compare to input file In M/R Job.
> > let A Log file around 30GB
> >and B Log file size is around 60 GB
> >
> >  I wanted to know how  i will  define  inside the mapper.
> >
> >  Thanks
> >  samir.
>
>
>
> --
> Harsh J
>

Re: 3 machine cluster trouble

2012-05-24 Thread Pat Ferrel

ok, so all nodes are configured the same except for master/slave
differences. They are all running hdfs all daemons seem to be running
when I do a start-all.sh from the master. However the master Map/Reduce
Administration page shows only two live nodes. The HDFS page shows 3.

Looking at the log files on the new slave node I see no outright errors
but see this in the tasktracker log file. All machines have 8G memory. I
think the important part below is TaskTracker's
totalMemoryAllottedForTasks is -1. I've searched for others with this
problem but haven't found something for my case, which is just trying to
startup. No tasks have been run.

2012-05-24 11:20:46,786 INFO org.apache.hadoop.mapred.TaskTracker:
Starting tracker tracker_occam3:localhost/127.0.0.1:45700
2012-05-24 11:20:46,792 INFO org.apache.hadoop.mapred.TaskTracker:
Starting thread: Map-events fetcher for all reduce tasks on
tracker_occam3:localhost/127.0.0.1:45700
2012-05-24 11:20:46,792 INFO org.apache.hadoop.mapred.TaskTracker:
Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5abd09e8
2012-05-24 11:20:46,795 WARN org.apache.hadoop.mapred.TaskTracker:
TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
disabled.
2012-05-24 11:20:46,795 INFO org.apache.hadoop.mapred.IndexCache:
IndexCache created with max memory = 10485760
2012-05-24 11:20:46,800 INFO org.apache.hadoop.mapred.TaskTracker:
Shutting down: Map-events fetcher for all reduce tasks on
tracker_occam3:localhost/127.0.0.1:45700
2012-05-24 11:20:46,800 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup...

java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)
2012-05-24 11:20:46,900 INFO org.apache.hadoop.ipc.Server: Stopping
server on 45700
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 45700: exiting
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 45700: exiting
2012-05-24 11:20:46,902 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 45700: exiting
2012-05-24 11:20:46,902 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 45700
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 45700: exiting
2012-05-24 11:20:46,904 INFO
org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-05-24 11:20:46,904 INFO org.apache.hadoop.mapred.TaskTracker:
Shutting down StatusHttpServer
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 45700: exiting
2012-05-24 11:20:46,903 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 45700: exiting
2012-05-24 11:20:46,903 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 45700: exiting
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 45700: exiting
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2012-05-24 11:20:46,909 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50060

On 5/23/12 3:55 PM, James Warren wrote:

Hi Pat -

The setting for hadoop.tmp.dir is used both locally and on HDFS and
therefore should be consistent across your cluster.

http://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir

cheers,
-James

On Wed, May 23, 2012 at 3:44 PM, Pat Ferrel wrote:

I have a two machine cluster and am adding a new machine. The new node has
a different location for hadoop.tmp.dir than the other two nodes and
refuses to start the datanode when started in the cluster. When I change
the location pointed to by hadoop.tmp.dir to be the same on all machines it
starts up fine on all machines.

Shouldn't I be able to have the master and slave1 set as:

hadoop.tmp.dir
/app/hadoop/tmp
A base for other temporary directories.

And slave2 set as:

hadoop.tmp.dir
/media/d2/app/hadoop/**tmp
A base for other temporary directories.

??? Slave2 runs standalone in single node mode just fine. Using 0.20.205.

Re: While Running in cloudera version of hadoop getting error

2012-05-24 Thread Marcos Ortiz


Why don´t use the same Hadoop version in both clusters?
It will brings to you minor troubles.


On 05/24/2012 02:26 PM, samir das mohapatra wrote:

Hi
   I created application jar and i was trying to run in 2 node cluster using
cludera  .20 version  , it was running fine,
But when i am running that same jar in Deployment server (Cloudera version
.20.x ) having 40 node cluster I am getting error

cloude any one please help me with this.

12/05/24 09:39:09 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

Like this says here, you should implement Tool for your MapReduce Job


12/05/24 09:39:10 INFO mapred.FileInputFormat: Total input paths to process
: 1

12/05/24 09:39:10 INFO mapred.JobClient: Running job: job_201203231049_12426

12/05/24 09:39:11 INFO mapred.JobClient:  map 0% reduce 0%

12/05/24 09:39:20 INFO mapred.JobClient: Task Id :
attempt_201203231049_12426_m_00_0, Status : FAILED

java.lang.RuntimeException: Error in configuring object

 at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)

 at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)

 at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)

 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)

 at org.apache.hadoop.mapred.Child.main(Child.java:264)

Caused by: java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav

attempt_201203231049_12426_m_00_0: getDefaultExtension()

12/05/24 09:39:20 INFO mapred.JobClient: Task Id :
attempt_201203231049_12426_m_01_0, Status : FAILED



Thanks

samir





--
Marcos Luis Ortíz Valmaseda
 Data Engineer&&  Sr. System Administrator at UCI
 http://marcosluis2186.posterous.com
 http://www.linkedin.com/in/marcosluis2186
 Twitter: @marcosluis2186


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: 3 machine cluster trouble

2012-05-24 Thread Pat Ferrel

Oops, after a few trials I got an ERROR for incompatible builds
versions. Copied code from the master, reformatted, et voila.

On 5/24/12 11:34 AM, Pat Ferrel wrote:
ok, so all nodes are configured the same except for master/slave
differences. They are all running hdfs all daemons seem to be running
when I do a start-all.sh from the master. However the master
Map/Reduce Administration page shows only two live nodes. The HDFS
page shows 3.

Looking at the log files on the new slave node I see no outright
errors but see this in the tasktracker log file. All machines have 8G
memory. I think the important part below is TaskTracker's
totalMemoryAllottedForTasks is -1. I've searched for others with this
problem but haven't found something for my case, which is just trying
to startup. No tasks have been run.

java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)
2012-05-24 11:20:46,900 INFO org.apache.hadoop.ipc.Server: Stopping
server on 45700
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 45700: exiting
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 45700: exiting
2012-05-24 11:20:46,902 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 45700: exiting
2012-05-24 11:20:46,902 INFO org.apache.hadoop.ipc.Server: Stopping
IPC Server listener on 45700
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 45700: exiting
2012-05-24 11:20:46,904 INFO
org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-05-24 11:20:46,904 INFO org.apache.hadoop.mapred.TaskTracker:
Shutting down StatusHttpServer
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 45700: exiting
2012-05-24 11:20:46,903 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 45700: exiting
2012-05-24 11:20:46,903 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 45700: exiting
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 45700: exiting
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: Stopping
IPC Server Responder
2012-05-24 11:20:46,909 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50060

On 5/23/12 3:55 PM, James Warren wrote:

Hi Pat -

The setting for hadoop.tmp.dir is used both locally and on HDFS and
therefore should be consistent across your cluster.

http://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir

cheers,
-James

On Wed, May 23, 2012 at 3:44 PM, Pat Ferrel
wrote:

I have a two machine cluster and am adding a new machine. The new
node has

a different location for hadoop.tmp.dir than the other two nodes and
refuses to start the datanode when started in the cluster. When I
change
the location pointed to by hadoop.tmp.dir to be the same on all
machines it

starts up fine on all machines.

Shouldn't I be able to have the master and slave1 set as:

hadoop.tmp.dir
/app/hadoop/tmp
A base for other temporary directories.

And slave2 set as:

hadoop.tmp.dir
/media/d2/app/hadoop/**tmp
A base for other temporary directories.

??? Slave2 runs standalone in single node mode just fine. Using
0.20.205.

RE: Right way to implement MR ?

Re: Right way to implement MR ?

Re: 3 machine cluster trouble

Re: While Running in cloudera version of hadoop getting error

Re: 3 machine cluster trouble

5 matches

Site Navigation

Mail list logo

Footer information