Writing a simple sort application for Hadoop

2010-02-28 Thread aa225
Hello,
  I am trying to write a simple sorting application for hadoop. This is what
I have thought till now. Suppose I have 100 lines of data and 10 mappers, each 
of
the 10 mappers will sort the data given to it. But I am unable to figure out is
how to join these outputs to one big sorted array. In other words what should be
the code to be written in the reduce ?


Best Regards from Buffalo

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)





Hadoop: Divide and Conquer Algorithms

2010-02-28 Thread aa225
Hello Everybody,
I have a small question. I want to know how would one implement
divide and conquer algorithms in Hadoop. For example suppose I want to implement
merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines.
Now comes the tough part

In the traditional version of merge sort each piece of 10 lines is combined to
form 5 pieces of 20 lines. The each piece of 20 lines is combined to form 3
pieces of 40 lines and so on. I am unable to understand how to implement this
functionality in the reducer. 

Any help would be welcome

PS Although the example I have given here is of Merge Sort, my actual problem is
some thing else so I cannot used the algorithm of external merge sort. 


Best Regards from Buffalo

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)





Re: Re: Writing a simple sort application for Hadoop

2010-02-28 Thread aa225
Hi,
   Is there any way we can chain the reducers . As in initially the reducers 
work
on some data. The output of these reducers is again sent to the same reducers
again and so on. Similar to how the conquer step takes place in divide and
conquer algorithms ? I hope you got what I am trying to ask ? 
The problem that I am trying to actually solve is not sorting but some thing
which can be solved by the divide and conquer algorithm

Best Regards from Buffalo

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 02/28/10  3:24 PM , Ed Mazur ma...@cs.umass.edu sent:
 Hi Abhishek,
 
 If you use input lines as your output keys in map, Hadoop internals
 will do the work for you and the keys will appear in sorted order in
 your reduce (you can use IdentityReducer). This needs a slight
 adjustment if your input lines aren't unique.
 
 If you have R reducers, this will create R sorted files. If you want a
 single sorted file, you can merge the R files or use 1 reducer.
 Another way is to use TotalOrderPartitioner which will ensure all keys
 in reduce N come after all keys in reduce N-1.
 
 Owen O'Malley and Arun C. Murthy's paper [1] about using Hadoop to win
 a sorting competition might be of interest to you.
 
 Ed
 
 [1] http://sortbenchmark.org/Yahoo2009.pdf
 On Sun, Feb 28, 2010 at 1:53 PM,  aa...@buffa
 lo.edu wrote: Hello,
       I am
 trying to write a simple sorting application for hadoop. This is
 what I have thought till now. Suppose I have 100
 lines of data and 10 mappers, each of the 10 mappers will sort the data given
to it.
 But I am unable to figure out is how to join these outputs to one big sorted
 array. In other words what should be the code to be written in the reduce
 ?
 
  Best Regards from Buffalo
 
  Abhishek Agrawal
 
  SUNY- Buffalo
  (716-435-7122)
 
 
 
 
 
 
 
 
 



Some information on Hadoop Sort

2010-02-19 Thread aa225
Hello,
  I was wondering if some one could me some information on hadoop does the
sorting. From what I have read there does not seem to be a map class and reduce
class ? Where and how is the sorting parallelized ?


Best Regards from Buffalo

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)





Re: Re: Inverse of a matrix using Map - Reduce

2010-02-03 Thread aa225
Hi,
   Any idea how this method will scale for dense matrices ?The kind of matrices 
I
am going to be working with are 500,000*500,000. Will this be a problem. Also
have you used this patch ?

Best Regards from Buffalo

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Wed 02/03/10  1:41 AM , Ganesh Swami gan...@iamganesh.com sent:
 What about the Moore-Penrose inverse?
 
 http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse
 
 The pseudo-inverse coincides with the regular inverse when the matrix
 is non-singular. Moreover, it can be computed using the SVD.
 
 Here's a patch for a MapReduce version of the SVD:
 https://issues.apache.org/jira/browse/MAHOUT-180
 Ganesh
 
 On Tue, Feb 2, 2010 at 10:11 PM,  aa...@buffa
 lo.edu wrote: Hello People,
       
      My name is Abhishek Agrawal. For
 the last few days I have been trying to figure out how to calculate the
inverse of a
 matrix using Map Reduce. Matrix inversion has 2 common approaches. Gaussian-
 Jordan and the cofactor of transpose method. But both of them dont seem to be
suited
 too well for Map- Reduce. Gaussian Jordan involves blocking co factoring a
 matrix requires repeated calculation of determinant.
 
  Can some one give me any pointers so as to how
 to solve this problem ?
  Best Regards from Buffalo
 
  Abhishek Agrawal
 
  SUNY- Buffalo
  (716-435-7122)
 
 
 
 
 
 
 
 
 



Inverse of a matrix using Map - Reduce

2010-02-02 Thread aa225
Hello People,
My name is Abhishek Agrawal. For the last few days I have been 
trying
to figure out how to calculate the inverse of a matrix using Map Reduce. Matrix
inversion has 2 common approaches. Gaussian- Jordan and the cofactor of 
transpose
method. But both of them dont seem to be suited too well for Map- Reduce.
Gaussian Jordan involves blocking co factoring a matrix requires repeated
calculation of determinant. 

Can some one give me any pointers so as to how to solve this problem ? 

Best Regards from Buffalo

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)





Eclipse Plugin for Hadoop

2010-01-16 Thread aa225
Hi all,
   I was just looking around and I stumbled across the Eclipse plugin for
Hadoop. Have any of you guys used this plug in ? Any thoughts on this ?


Best Regards from Buffalo

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)





Re: Re: Re: Re: Doubt in Hadoop

2009-11-29 Thread aa225
Hi,
   Actually, I just made the change suggested by Aaron and my code worked. But I
still would like to know why does the setJarbyClass() method have to be called
when the Main class and the Map and Reduce classes are in the same package ?

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/29/09 10:39 AM , aa...@buffalo.edu sent:
 Hi,
 I dont set job.setJarByClass(Map.class). But my main class, the map class
 andthe reduce class are all in the same package. Does this make any difference
 at ordo I still have to call 
 
 Thank You
 
 Abhishek Agrawal
 
 SUNY- Buffalo
 (716-435-7122)
 
 On Fri 11/27/09  1:42 PM , Aaron Kimball aa...@clou
 dera.com sent: When you set up the Job object, do you
 call job.setJarByClass(Map.class)? That will tell
 Hadoop which jar file to ship with the job and to use for classloading in
 your code. 
  - Aaron
  On Thu, Nov 26, 2009 at 11:56 PM,  
 wrote: Hi,
    I am running the job from command
 line. The job runs fine in the local mode
  but something happens when I try to run the job
 in the distributed mode.
  Abhishek Agrawal
  SUNY- Buffalo
  (716-435-7122)
  On Fri 11/27/09  2:31 AM , Jeff
 Zhang  sent:  Do you run the map reduce job in command
 line or IDE?  in map reduce
   mode, you should put the jar containing the
 map and reduce class in
   your classpath
   Jeff Zhang
   On Fri, Nov 27, 2009 at 2:19 PM,
   wrote:  Hello Everybody,
        
          I have
 a doubt in Haddop and was wondering if
   anybody has faced a
   similar problem. I have a package called
 test. Inside that I have  class called
   A.java, Map.java, Reduce.java. In A.java I
 have the main method  where I am trying
   to initialize the jobConf object. I have
 written  jobConf.setMapperClass(Map.class) and
 similarly for the reduce class
   as well. The
   code works correctly when I run the code
 locally via 
 jobConf.set(mapred.job.tracker,local) but I get an
 exception  when I try to
   run this code on my cluster. The stack
 trace of the exception is as
   under. I
   cannot understand the problem. Any help
 would be appreciated.  java.lang.RuntimeException:
 java.lang.RuntimeException:  java.lang.ClassNotFoundException:
 test.Map       
  at 
 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)    
    
  at 
 org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:690)       
  at 
 org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)       
  at 
 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)    
    
  at 
  
 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
        
  at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)       
  at
 org.apache.hadoop.mapred.Child.main(Child.java:158)  Caused by:
 java.lang.RuntimeException: 
 java.lang.ClassNotFoundException:  Markowitz.covarMatrixMap
        
  at 
 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720)    
    
  at 
 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:744)    
    
  ... 6 more  Caused by:
 java.lang.ClassNotFoundException: test.Map       
  at
 java.net.URLClassLoader$1.run(URLClassLoader.java:200)       
  at
 java.security.AccessController.doPrivileged(Native  Method)
        
  at 
 java.net.URLClassLoader.findClass(URLClassLoader.java:188)       
  at
 java.lang.ClassLoader.loadClass(ClassLoader.java:306)       
  at 
 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)       
  at
 java.lang.ClassLoader.loadClass(ClassLoader.java:251)       
  at 
 java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)       
  at java.lang.Class.forName0(Native Method)       
  at java.lang.Class.forName(Class.java:247)       
  at 
  
 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673)
        
  at 
 
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718)    
    
  ... 7 more  Thank You
   Abhishek Agrawal
   SUNY- Buffalo
   (716-435-7122)
  
  
  
  
 
 
 
 
 



Object Serialization

2009-11-29 Thread aa225
Hello Everybody,
 I have a question about object serialization in Hadoop. I have
an object A which I want to pass to every map function. Currently the code I am
using for this is as under. The problem is if I run my program, the code crashes
the first time with an error say that Java cannot deserialize the object list(
but no error when java tries to serialize it ) and then when I run the program
for the 2 time, without changing anything, the code works perfectly. 

I read on some blog post that the method I have used to serialize is not the
ideal way. But this also does not explain the weird results I am getting. 

 try
{
ByteArrayOutputStream baos= new ByteArrayOutputStream();
ObjectOutputStream oos= new ObjectOutputStream(baos);
oos.writeObject(list);
stock_list= encode.encode(baos.toByteArray()); 
}
catch(IOException e)
{
e.printStackTrace();
}

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)





Re: please help in setting hadoop

2009-11-26 Thread aa225
Hi,
Just a thought, but you do not need to setup the temp directory in
conf/hadoop-site.xml especially if you are running basic examples. Give that a
shot, maybe it will work out. Otherwise see if you can find additional info in
the LOGS 

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Fri 11/27/09 12:20 AM , Krishna Kumar krishna.ku...@nechclst.in sent:
 Dear All,
 Can anybody please help me in getting out from these error messages:
 [ hadoop]# hadoop jar
 /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar
 wordcount
 test test-op
 
 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to
 process : 4
 
 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to
 process : 4
 
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid
 local directories in property: mapred.local.dir
 
 at
 org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:730
 )
 
 at
 org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:222)
 
 at
 org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:194)
 
 at
 org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1557)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
 
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
 a:39)
 
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
 
 at java.lang.reflect.Method.invoke(Method.java:585)
 
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
 
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
 I am running the hadoop cluster as root user on two server nodes:
 master
 and slave.  My hadoop-site.xml file format is as follows :
 fs.default.name
 
 hdfs://master:54310
 dfs.permissions
 
 false
 dfs.name.dir
 
 /home/hadoop/dfs/name
 Further the o/p of ls command is as follows:
 
 [ hadoop]# ls -l /home/hadoop/hadoop-root/
 
 total 8
 
 drwxr-xr-x 4 root root 4096 Nov 26 16:48 dfs
 
 drwxr-xr-x 3 root root 4096 Nov 26 16:49 mapred
 
 [ hadoop]#
 
 [ hadoop]#
 
 [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/
 
 total 4
 
 drwxr-xr-x 2 root root 4096 Nov 26 16:49 local
 
 [ hadoop]#
 
 [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/local/
 
 total 0
 Thanks and Best Regards,
 
 Krishna Kumar
 
 Senior Storage Engineer 
 
 Why do we have to die? If we had to die, and everything is gone after
 that, then nothing else matters on this earth - everything is
 temporary,
 at least relative to me.
 DISCLAIMER: 
 ---
 
 The contents of this e-mail and any attachment(s) are confidential
 and
 intended 
 for the named recipient(s) only.  
 It shall not attach any liability on the originator or NECHCL or its 
 affiliates. Any views or opinions presented in  
 this email are solely those of the author and may not necessarily
 reflect the 
 opinions of NECHCL or its affiliates.  
 Any form of reproduction, dissemination, copying, disclosure,
 modification, 
 distribution and / or publication of  
 this message without the prior written consent of the author of this
 e-mail is 
 strictly prohibited. If you have  
 received this email in error please delete it and notify the sender 
 immediately. . 
 ---
 
 
 



Re: RE: please help in setting hadoop

2009-11-26 Thread aa225
Hi,
   There should be a folder called as logs in $HADOOP_HOME. Also try going
through
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29.
 

This is a pretty good tutorial

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Fri 11/27/09  1:18 AM , Krishna Kumar krishna.ku...@nechclst.in sent:
 I have tried, but didn't get any success. In bwt can you please tell exact
 path of log file which I have to refer.
 
 
 Thanks and Best Regards,
 
 Krishna Kumar
 
 Senior Storage Engineer 
 
 Why do we have to die? If we had to die, and everything is gone after that,
 then nothing else matters on this earth - everything is temporary, at least
 relative to me.
 
 
 
 
 -Original Message-
 
 From: aa...@buffalo.edu [aa...@buffa
 lo.edu] 
 Sent: Friday, November 27, 2009 10:56 AM
 
 To: common-user@hadoop.apache.org
 Subject: Re: please help in setting hadoop
 
 
 
 Hi,
 
 Just a thought, but you do not need to setup the temp directory in
 
 conf/hadoop-site.xml especially if you are running basic examples. Give
 that a
 shot, maybe it will work out. Otherwise see if you can find additional info
 in
 the LOGS 
 
 
 
 Thank You
 
 
 
 Abhishek Agrawal
 
 
 
 SUNY- Buffalo
 
 (716-435-7122)
 
 
 
 On Fri 11/27/09 12:20 AM , Krishna Kumar kri
 shna.ku...@nechclst.in sent:
  Dear All,
 
  Can anybody please help me in getting out from
 these error messages:
  [ hadoop]# hadoop jar
 
 
 /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar
  wordcount
 
  test test-op
 
  
 
  09/11/26 17:15:45 INFO mapred.FileInputFormat:
 Total input paths to
  process : 4
 
  
 
  09/11/26 17:15:45 INFO mapred.FileInputFormat:
 Total input paths to
  process : 4
 
  
 
  org.apache.hadoop.ipc.RemoteException:
 java.io.IOException: No valid
  local directories in property: mapred.local.dir
 
  
 
  at
 
 
 org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:730
  )
 
  
 
  at
 
 
 org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:222)
  
 
  at
 
 
 org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:194)
  
 
  at
 
 
 org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1557)
  
 
  at
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native
  Method)
 
  
 
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
  a:39)
 
  
 
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
  Impl.java:25)
 
  
 
  at
 java.lang.reflect.Method.invoke(Method.java:585)
  
 
  at
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
  
 
  at
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
  I am running the hadoop cluster as root user on
 two server nodes:
  master
 
  and slave.  My hadoop-site.xml file format is as
 follows :
  fs.default.name
 
  
 
  hdfs://master:54310
  dfs.permissions
 
  
 
  false
 
  dfs.name.dir
 
  
 
  /home/hadoop/dfs/name
 
  Further the o/p of ls command is as follows:
 
  
 
  [ hadoop]# ls -l /home/hadoop/hadoop-root/
 
  
 
  total 8
 
  
 
  drwxr-xr-x 4 root root 4096 Nov 26 16:48 dfs
 
  
 
  drwxr-xr-x 3 root root 4096 Nov 26 16:49 mapred
 
  
 
  [ hadoop]#
 
  
 
  [ hadoop]#
 
  
 
  [ hadoop]# ls -l
 /home/hadoop/hadoop-root/mapred/
  
 
  total 4
 
  
 
  drwxr-xr-x 2 root root 4096 Nov 26 16:49 local
 
  
 
  [ hadoop]#
 
  
 
  [ hadoop]# ls -l
 /home/hadoop/hadoop-root/mapred/local/
  
 
  total 0
 
  Thanks and Best Regards,
 
  
 
  Krishna Kumar
 
  
 
  Senior Storage Engineer 
 
  
 
  Why do we have to die? If we had to die, and
 everything is gone after
  that, then nothing else matters on this earth -
 everything is
  temporary,
 
  at least relative to me.
 
  DISCLAIMER: 
 
 
 ---
  
 
  The contents of this e-mail and any
 attachment(s) are confidential
  and
 
  intended 
 
  for the named recipient(s) only.  
 
  It shall not attach any liability on the
 originator or NECHCL or its 
  affiliates. Any views or opinions presented in  
 
  this email are solely those of the author and
 may not necessarily
  reflect the 
 
  opinions of NECHCL or its affiliates.  
 
  Any form of reproduction, dissemination,
 copying, disclosure,
  modification, 
 
  distribution and / or publication of  
 
  this message without the prior written consent
 of the author of this
  e-mail is 
 
  strictly prohibited. If you have  
 
  received this email in error please delete it
 and notify the sender 
  immediately. . 
 
 
 ---
  
 
  
 
  
 
 
 
 
 
 
 
 
 
 DISCLAIMER: 
 
 ---
  
 The contents of this e-mail and any attachment(s) are confidential and
 
 intended 
 
 for the named recipient(s) only.  
 
 It 

Re: Re: Doubt in Hadoop

2009-11-26 Thread aa225
Hi,
   I am running the job from command line. The job runs fine in the local mode
but something happens when I try to run the job in the distributed mode.


Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Fri 11/27/09  2:31 AM , Jeff Zhang zjf...@gmail.com sent:
 Do you run the map reduce job in command line or IDE?  in map reduce
 mode, you should put the jar containing the map and reduce class in
 your classpath
 Jeff Zhang
 On Fri, Nov 27, 2009 at 2:19 PM,   wrote:
 Hello Everybody,
                I have a doubt in Haddop and was wondering if
 anybody has faced a
 similar problem. I have a package called test. Inside that I have
 class called
 A.java, Map.java, Reduce.java. In A.java I have the main method
 where I am trying
 to initialize the jobConf object. I have written
 jobConf.setMapperClass(Map.class) and similarly for the reduce class
 as well. The
 code works correctly when I run the code locally via
 jobConf.set(mapred.job.tracker,local) but I get an exception
 when I try to
 run this code on my cluster. The stack trace of the exception is as
 under. I
 cannot understand the problem. Any help would be appreciated.
 java.lang.RuntimeException: java.lang.RuntimeException:
 java.lang.ClassNotFoundException: test.Map
        at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
        at
 org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:690)
        at
 org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
        at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at
 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
        at org.apache.hadoop.mapred.Child.main(Child.java:158)
 Caused by: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
 Markowitz.covarMatrixMap
        at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720)
        at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:744)
        ... 6 more
 Caused by: java.lang.ClassNotFoundException: test.Map
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native
 Method)
        at
 java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at
 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at
 java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at
 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673)
        at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718)
        ... 7 more
 Thank You
 Abhishek Agrawal
 SUNY- Buffalo
 (716-435-7122)
 
 



Help in Hadoop

2009-11-22 Thread aa225
Hello Everybody,
I have a doubt in a map reduce program and I would appreciate 
any
help. I run the program using the command bin/hadoop jar HomeWork.jar prg1 input
output. Ideally from within prg1, I want to sequentially launch 10 map- reduce
tasks. I want to store the output of all these map reduce tasks in some file.
Currently I have kept the input format and output format of the jobs as
TextInputFormat and TextOutputFormat respectively. Now I have the following
questions.

1. When I run more than 1 task from the same program, the output file of all the
tasks is same. The framework does not allows the 2   map reduce task to have the
same output file as task 1.

2. Before the 2 task launches I also get this error 

Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized

3. When the 2 map reduce tasks writes its output to file output, wont the
previous content of this file get over written ?

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)





Re: Re: Help in Hadoop

2009-11-22 Thread aa225
Hellow,
   If I write the output of the 10 tasks in 10 different files then how do I
go about merging the output ? Is there some in built functionality or do I have
to write some code for that ?

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/22/09  5:40 PM , Gang Luo lgpub...@yahoo.com.cn sent:
 Hi. If the output path already exists, it seems you could not execute any
 task with the same output path. I think you can output the results of the
 10 tasks to 10 different paths, and then do sth more (by the 11th task, for
 example) to merge the 10 results into 1 file. 
 
 Gang Luo
 -
 Department of Computer Science
 Duke University
 (919)316-0993
 gang@du
 ke.edu
 
 
 - ���件 
 �件人� aa...@buffa
 lo.edu aa...@buffa
 lo.edu�件人� common-u...@hadoop.apache.org����� 2009/11/22
 (��) 5:25:55 ��主   �� Help in Hadoop
 
 Hello Everybody,
 I have a doubt in a map reduce program and I would appreciate any
 help. I run the program using the command bin/hadoop jar HomeWork.jar prg1
 inputoutput. Ideally from within prg1, I want to sequentially launch 10 map-
 reducetasks. I want to store the output of all these map reduce tasks in some
 file.Currently I have kept the input format and output format of the jobs as
 TextInputFormat and TextOutputFormat respectively. Now I have the
 followingquestions.
 
 1. When I run more than 1 task from the same program, the output file of
 all thetasks is same. The framework does not allows the 2   map reduce task to
 have thesame output file as task 1.
 
 2. Before the 2 task launches I also get this error 
 
 Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
 alreadyinitialized
 
 3. When the 2 map reduce tasks writes its output to file
 output, wont theprevious content of this file get over written ?
 
 Thank You
 
 Abhishek Agrawal
 
 SUNY- Buffalo
 (716-435-7122)
 
 
 ___ 
 好�贺�cce
 dil;­�ä½ å��ï¼�é�®ç
 ;®±è´ºå�¡å�¨æ�°
 ;�线� http://card.mail.cn.yahoo.com/
 
 
 



Re: Re: Re: Re: Help in Hadoop

2009-11-22 Thread aa225
I am still getting the same exception. This is the stack trace of it.

java.io.IOException: Not a file: 
hdfs://zeus:18004/user/hadoop/output6/MatrixA-Row1
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:195)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at MatrixMultiplication.main(MatrixMultiplication.java:229)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)


Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Sun 11/22/09  9:28 PM , Jason Venner jason.had...@gmail.com sent:
 set the number of reduce tasks to 1.
 
 2009/11/22  
 Hi everybody,
             The 10 different map-reducers store their
 respective outputs in 10
 different files. This is the snap shot
 had...@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -ls output5
 Found 2 items
 drwxr-xr-x   - hadoop supergroup          0 2003-05-16 02:16
 /user/hadoop/output5/MatrixA-Row1
 drwxr-xr-x   - hadoop supergroup          0 2003-05-16 02:16
 /user/hadoop/output5/MatrixA-Row2
 Now when I try to open any of these files I get an error message
 had...@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -cat
 output5/MatrixA-Row1
 cat: Source must be a file.
 had...@zeus:~/hadoop-0.19.1$
 But if I run
 had...@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -cat
 output5/MatrixA-Row1/part-0
 I get the correct output. I do not understand why I have to give
 this extra
 part-0. Now when I run a map reduce task to merge the outputs
 of all the
 files, I give the name of the directory output5 as the Input path.
 But I get a
 bug saying
 java.io.IOException: Not a file:
 hdfs://zeus:18004/user/hadoop/output5/MatrixA-Row1
 I cannot understand how to make the frame work read my files.
 Alternatively I tried to avoid the map reduce approach for combining
 files and do
 it via a simple program, but I am unable to start. Can some one give
 me some
 sample implementation or something.
 Any help is appreciated
 Thank You
 Abhishek Agrawal
 SUNY- Buffalo
 (716-435-7122)
 On Sun 11/22/09  5:48 PM ,  sent:
  Hellow,
  If I write the output of the 10 tasks in 10 different files then
 how do
  Igo about merging the output ? Is there some in built
 functionality or do I
  haveto write some code for that ?
 
  Thank You
 
  Abhishek Agrawal
 
  SUNY- Buffalo
  (716-435-7122)
 
  On Sun 11/22/09  5:40 PM , Gang Luo lgpubli
   sent: Hi. If the output path already exists, it seems
  you could not execute any task with the same output path. I think
 you can
  output the results of the 10 tasks to 10 different paths, and
 then do sth
  more (by the 11th task, for example) to merge the 10 results into
 1 file.
  
   Gang Luo
   -
   Department of Computer Science
   Duke University
   (919)316-0993
   gang@du ke.edu [4]
  
  
   -
  å��åAcir
  c;§ï¿½Ã©ï¿½Âr
  eg;件 
  å��äA
  circ;»Â¶Ã¤ÂºÂºÃ
  ¯Â¼ï¿½ aa...@buffa lo.edu [5] 
  lo.edu [6]�Ãcurre
  n;»¶äººAti
  lde;¯Â¼ï¿½ common-u...@hadoop.apache.orgå�ï
  ¿½Ã©ï¿½ï¿½
  Atilde;¦ï¿½Â¥Ã¦ïiqu
  est;½ï¿½Ã¯Â¼ïiques
  t;½ 2009/11/22
  (å�¨æïique
  st;½Â¥) 5:25:55
  ä¸�åïiqu
  est;½ï¿½Ã¤Â¸Âraq
  uo;
  é¢�ïÂfrac
  14;� Help in Hadoop
   Hello Everybody,
   I have a doubt in a map reduce program and I
  would appreciate any help. I run the program using the command
  bin/hadoop jar HomeWork.jar prg1 inputoutput. Ideally from within
 prg1, I want to
  sequentially launch 10 map- reducetasks. I want to store the
 output of all
  these map reduce tasks in some file.Currently I have kept the
 input format and
  output format of the jobs as TextInputFormat and TextOutputFormat
  respectively. Now I have the followingquestions.
  
   1. When I run more than 1 task from the same
  program, the output file of all thetasks is same. The framework
 does not
  allows the 2   map reduce task to have thesame output file as
 task 1.
  
   2. Before the 2 task launches I also get this
  error 
   Cannot initialize JVM Metrics with
  processName=JobTracker, sessionId= - alreadyinitialized
  
   3. When the 2 map reduce tasks writes its output
  to file output, wont theprevious content of
  this file get over written ?
   Thank You
  
   Abhishek Agrawal
  
   SUNY- Buffalo
   (716-435-7122)
  
  
  
  ___ 
  好ç�
  

Re: Re: Using hadoop for Matrix Multiplication in NFS?

2009-11-13 Thread aa225
Hi ,
 I do not know if this will be helpful or not but I also wanted to use 
hadoop
to do matrix multiplication. I came across a package called Hama which uses map
reduce programs to multiply 2 matrices. To store the 2 matrices it used HBase.
You could give that a shot.

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Fri 11/13/09 12:06 PM , Brian Bockelman bbock...@cse.unl.edu sent:
 Hi,
 
 Assuming you're doing math...
 What you want is PETSc for sparse matrices:
http://www.mcs.anl.gov/petsc/petsc-as/If you're doing dense matrices, probable
scalapack: http://www.netlib.org/scalapack/
 You benefit from working with someone who has a background in  
 numerical analysis.
 
 Brian
 
 On Nov 14, 2009, at 12:42 AM, zjffdu wrote:
 
  See my comments
 
 
  -Original Message-
  From: Gimick [gimmick
 i...@gmail.com] Sent:
 2009年11�12� 23:22 To: c
 ore-u...@hadoop.apache.org Subject: Using hadoop for Matrix Multiplication
 in NFS?
 
  Hi, I am new to hadoop.  I am planning to do
 matrix multiplication  (of order
  millions) using hadoop.
 
  I have a few queries regarding the
 above.
  i) Will using hadoop be a fix for this or should
 I try some other approaches?
 
  --- Hama maybe such a tool that fit for your
 requirement, http://incubator.apache.org/hama/
  ii) I will be using it in NFS.  Will using
 hadoop still be a good   option?
  --- If you want to use NFS, I guess you have to
 provide your own InputFormat. So you'd better put your data into
 hdfs, it will make   your work
  easy and improve your program's
 performance
 
 
  If I can use hadoop for this problem, could you
 plz send links to   configure
  hadoop-site.xml file for a nfs
 system.
  P.S. I tried a few setup instructions via
 search, but everything   seems to
  give Unable to connect to 
 error.
  -- 
  View this message in context:
  http://old.nabble.com/Using-hadoop-for-Matrix-Multiplication-in
 -NFS--tp26332 382p26332382.html
  Sent from the Hadoop core-user mailing list
 archive at Nabble.com.