Writing a simple sort application for Hadoop
Hello, I am trying to write a simple sorting application for hadoop. This is what I have thought till now. Suppose I have 100 lines of data and 10 mappers, each of the 10 mappers will sort the data given to it. But I am unable to figure out is how to join these outputs to one big sorted array. In other words what should be the code to be written in the reduce ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Hadoop: Divide and Conquer Algorithms
Hello Everybody, I have a small question. I want to know how would one implement divide and conquer algorithms in Hadoop. For example suppose I want to implement merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines. Now comes the tough part In the traditional version of merge sort each piece of 10 lines is combined to form 5 pieces of 20 lines. The each piece of 20 lines is combined to form 3 pieces of 40 lines and so on. I am unable to understand how to implement this functionality in the reducer. Any help would be welcome PS Although the example I have given here is of Merge Sort, my actual problem is some thing else so I cannot used the algorithm of external merge sort. Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Re: Re: Writing a simple sort application for Hadoop
Hi, Is there any way we can chain the reducers . As in initially the reducers work on some data. The output of these reducers is again sent to the same reducers again and so on. Similar to how the conquer step takes place in divide and conquer algorithms ? I hope you got what I am trying to ask ? The problem that I am trying to actually solve is not sorting but some thing which can be solved by the divide and conquer algorithm Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Sun 02/28/10 3:24 PM , Ed Mazur ma...@cs.umass.edu sent: Hi Abhishek, If you use input lines as your output keys in map, Hadoop internals will do the work for you and the keys will appear in sorted order in your reduce (you can use IdentityReducer). This needs a slight adjustment if your input lines aren't unique. If you have R reducers, this will create R sorted files. If you want a single sorted file, you can merge the R files or use 1 reducer. Another way is to use TotalOrderPartitioner which will ensure all keys in reduce N come after all keys in reduce N-1. Owen O'Malley and Arun C. Murthy's paper [1] about using Hadoop to win a sorting competition might be of interest to you. Ed [1] http://sortbenchmark.org/Yahoo2009.pdf On Sun, Feb 28, 2010 at 1:53 PM, aa...@buffa lo.edu wrote: Hello, Â Â Â I am trying to write a simple sorting application for hadoop. This is what I have thought till now. Suppose I have 100 lines of data and 10 mappers, each of the 10 mappers will sort the data given to it. But I am unable to figure out is how to join these outputs to one big sorted array. In other words what should be the code to be written in the reduce ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Some information on Hadoop Sort
Hello, I was wondering if some one could me some information on hadoop does the sorting. From what I have read there does not seem to be a map class and reduce class ? Where and how is the sorting parallelized ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Re: Re: Inverse of a matrix using Map - Reduce
Hi, Any idea how this method will scale for dense matrices ?The kind of matrices I am going to be working with are 500,000*500,000. Will this be a problem. Also have you used this patch ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Wed 02/03/10 1:41 AM , Ganesh Swami gan...@iamganesh.com sent: What about the Moore-Penrose inverse? http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse The pseudo-inverse coincides with the regular inverse when the matrix is non-singular. Moreover, it can be computed using the SVD. Here's a patch for a MapReduce version of the SVD: https://issues.apache.org/jira/browse/MAHOUT-180 Ganesh On Tue, Feb 2, 2010 at 10:11 PM, aa...@buffa lo.edu wrote: Hello People, Â Â Â Â Â Â My name is Abhishek Agrawal. For the last few days I have been trying to figure out how to calculate the inverse of a matrix using Map Reduce. Matrix inversion has 2 common approaches. Gaussian- Jordan and the cofactor of transpose method. But both of them dont seem to be suited too well for Map- Reduce. Gaussian Jordan involves blocking co factoring a matrix requires repeated calculation of determinant. Can some one give me any pointers so as to how to solve this problem ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Inverse of a matrix using Map - Reduce
Hello People, My name is Abhishek Agrawal. For the last few days I have been trying to figure out how to calculate the inverse of a matrix using Map Reduce. Matrix inversion has 2 common approaches. Gaussian- Jordan and the cofactor of transpose method. But both of them dont seem to be suited too well for Map- Reduce. Gaussian Jordan involves blocking co factoring a matrix requires repeated calculation of determinant. Can some one give me any pointers so as to how to solve this problem ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Eclipse Plugin for Hadoop
Hi all, I was just looking around and I stumbled across the Eclipse plugin for Hadoop. Have any of you guys used this plug in ? Any thoughts on this ? Best Regards from Buffalo Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Re: Re: Re: Re: Doubt in Hadoop
Hi, Actually, I just made the change suggested by Aaron and my code worked. But I still would like to know why does the setJarbyClass() method have to be called when the Main class and the Map and Reduce classes are in the same package ? Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Sun 11/29/09 10:39 AM , aa...@buffalo.edu sent: Hi, I dont set job.setJarByClass(Map.class). But my main class, the map class andthe reduce class are all in the same package. Does this make any difference at ordo I still have to call Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 1:42 PM , Aaron Kimball aa...@clou dera.com sent: When you set up the Job object, do you call job.setJarByClass(Map.class)? That will tell Hadoop which jar file to ship with the job and to use for classloading in your code. - Aaron On Thu, Nov 26, 2009 at 11:56 PM, wrote: Hi,  I am running the job from command line. The job runs fine in the local mode but something happens when I try to run the job in the distributed mode. Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09  2:31 AM , Jeff Zhang sent: Do you run the map reduce job in command line or IDE? in map reduce mode, you should put the jar containing the map and reduce class in your classpath Jeff Zhang On Fri, Nov 27, 2009 at 2:19 PM,  wrote: Hello Everybody,         I have a doubt in Haddop and was wondering if anybody has faced a similar problem. I have a package called test. Inside that I have class called A.java, Map.java, Reduce.java. In A.java I have the main method where I am trying to initialize the jobConf object. I have written jobConf.setMapperClass(Map.class) and similarly for the reduce class as well. The code works correctly when I run the code locally via jobConf.set(mapred.job.tracker,local) but I get an exception when I try to run this code on my cluster. The stack trace of the exception is as under. I cannot understand the problem. Any help would be appreciated. java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: test.Map     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:690)     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)     at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)     at org.apache.hadoop.mapred.Child.main(Child.java:158) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Markowitz.covarMatrixMap     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720)     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:744)     ... 6 more Caused by: java.lang.ClassNotFoundException: test.Map     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)     at java.security.AccessController.doPrivileged(Native Method)     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)     at java.lang.Class.forName0(Native Method)     at java.lang.Class.forName(Class.java:247)     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673)     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718)     ... 7 more Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Object Serialization
Hello Everybody, I have a question about object serialization in Hadoop. I have an object A which I want to pass to every map function. Currently the code I am using for this is as under. The problem is if I run my program, the code crashes the first time with an error say that Java cannot deserialize the object list( but no error when java tries to serialize it ) and then when I run the program for the 2 time, without changing anything, the code works perfectly. I read on some blog post that the method I have used to serialize is not the ideal way. But this also does not explain the weird results I am getting. try { ByteArrayOutputStream baos= new ByteArrayOutputStream(); ObjectOutputStream oos= new ObjectOutputStream(baos); oos.writeObject(list); stock_list= encode.encode(baos.toByteArray()); } catch(IOException e) { e.printStackTrace(); } Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Re: please help in setting hadoop
Hi, Just a thought, but you do not need to setup the temp directory in conf/hadoop-site.xml especially if you are running basic examples. Give that a shot, maybe it will work out. Otherwise see if you can find additional info in the LOGS Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 12:20 AM , Krishna Kumar krishna.ku...@nechclst.in sent: Dear All, Can anybody please help me in getting out from these error messages: [ hadoop]# hadoop jar /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar wordcount test test-op 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid local directories in property: mapred.local.dir at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:730 ) at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:222) at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:194) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) I am running the hadoop cluster as root user on two server nodes: master and slave. My hadoop-site.xml file format is as follows : fs.default.name hdfs://master:54310 dfs.permissions false dfs.name.dir /home/hadoop/dfs/name Further the o/p of ls command is as follows: [ hadoop]# ls -l /home/hadoop/hadoop-root/ total 8 drwxr-xr-x 4 root root 4096 Nov 26 16:48 dfs drwxr-xr-x 3 root root 4096 Nov 26 16:49 mapred [ hadoop]# [ hadoop]# [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/ total 4 drwxr-xr-x 2 root root 4096 Nov 26 16:49 local [ hadoop]# [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/local/ total 0 Thanks and Best Regards, Krishna Kumar Senior Storage Engineer Why do we have to die? If we had to die, and everything is gone after that, then nothing else matters on this earth - everything is temporary, at least relative to me. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NECHCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NECHCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: RE: please help in setting hadoop
Hi, There should be a folder called as logs in $HADOOP_HOME. Also try going through http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29. This is a pretty good tutorial Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 1:18 AM , Krishna Kumar krishna.ku...@nechclst.in sent: I have tried, but didn't get any success. In bwt can you please tell exact path of log file which I have to refer. Thanks and Best Regards, Krishna Kumar Senior Storage Engineer Why do we have to die? If we had to die, and everything is gone after that, then nothing else matters on this earth - everything is temporary, at least relative to me. -Original Message- From: aa...@buffalo.edu [aa...@buffa lo.edu] Sent: Friday, November 27, 2009 10:56 AM To: common-user@hadoop.apache.org Subject: Re: please help in setting hadoop Hi, Just a thought, but you do not need to setup the temp directory in conf/hadoop-site.xml especially if you are running basic examples. Give that a shot, maybe it will work out. Otherwise see if you can find additional info in the LOGS Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 12:20 AM , Krishna Kumar kri shna.ku...@nechclst.in sent: Dear All, Can anybody please help me in getting out from these error messages: [ hadoop]# hadoop jar /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar wordcount test test-op 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid local directories in property: mapred.local.dir at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:730 ) at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:222) at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:194) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) I am running the hadoop cluster as root user on two server nodes: master and slave. My hadoop-site.xml file format is as follows : fs.default.name hdfs://master:54310 dfs.permissions false dfs.name.dir /home/hadoop/dfs/name Further the o/p of ls command is as follows: [ hadoop]# ls -l /home/hadoop/hadoop-root/ total 8 drwxr-xr-x 4 root root 4096 Nov 26 16:48 dfs drwxr-xr-x 3 root root 4096 Nov 26 16:49 mapred [ hadoop]# [ hadoop]# [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/ total 4 drwxr-xr-x 2 root root 4096 Nov 26 16:49 local [ hadoop]# [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/local/ total 0 Thanks and Best Regards, Krishna Kumar Senior Storage Engineer Why do we have to die? If we had to die, and everything is gone after that, then nothing else matters on this earth - everything is temporary, at least relative to me. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NECHCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NECHCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It
Re: Re: Doubt in Hadoop
Hi, I am running the job from command line. The job runs fine in the local mode but something happens when I try to run the job in the distributed mode. Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 2:31 AM , Jeff Zhang zjf...@gmail.com sent: Do you run the map reduce job in command line or IDE? in map reduce mode, you should put the jar containing the map and reduce class in your classpath Jeff Zhang On Fri, Nov 27, 2009 at 2:19 PM, wrote: Hello Everybody, I have a doubt in Haddop and was wondering if anybody has faced a similar problem. I have a package called test. Inside that I have class called A.java, Map.java, Reduce.java. In A.java I have the main method where I am trying to initialize the jobConf object. I have written jobConf.setMapperClass(Map.class) and similarly for the reduce class as well. The code works correctly when I run the code locally via jobConf.set(mapred.job.tracker,local) but I get an exception when I try to run this code on my cluster. The stack trace of the exception is as under. I cannot understand the problem. Any help would be appreciated. java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: test.Map at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752) at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:690) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338) at org.apache.hadoop.mapred.Child.main(Child.java:158) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Markowitz.covarMatrixMap at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:744) ... 6 more Caused by: java.lang.ClassNotFoundException: test.Map at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718) ... 7 more Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Help in Hadoop
Hello Everybody, I have a doubt in a map reduce program and I would appreciate any help. I run the program using the command bin/hadoop jar HomeWork.jar prg1 input output. Ideally from within prg1, I want to sequentially launch 10 map- reduce tasks. I want to store the output of all these map reduce tasks in some file. Currently I have kept the input format and output format of the jobs as TextInputFormat and TextOutputFormat respectively. Now I have the following questions. 1. When I run more than 1 task from the same program, the output file of all the tasks is same. The framework does not allows the 2 map reduce task to have the same output file as task 1. 2. Before the 2 task launches I also get this error Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 3. When the 2 map reduce tasks writes its output to file output, wont the previous content of this file get over written ? Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Re: Re: Help in Hadoop
Hellow, If I write the output of the 10 tasks in 10 different files then how do I go about merging the output ? Is there some in built functionality or do I have to write some code for that ? Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Sun 11/22/09 5:40 PM , Gang Luo lgpub...@yahoo.com.cn sent: Hi. If the output path already exists, it seems you could not execute any task with the same output path. I think you can output the results of the 10 tasks to 10 different paths, and then do sth more (by the 11th task, for example) to merge the 10 results into 1 file. Gang Luo - Department of Computer Science Duke University (919)316-0993 gang@du ke.edu - å��å§�é�®ä»¶ å��件人ï¼� aa...@buffa lo.edu aa...@buffa lo.eduæ�¶ä»¶äººï¼� common-u...@hadoop.apache.orgå��é��æ�¥æ��ï¼� 2009/11/22 (å�¨æ�¥) 5:25:55 ä¸�å��主 é¢�ï¼� Help in Hadoop Hello Everybody, I have a doubt in a map reduce program and I would appreciate any help. I run the program using the command bin/hadoop jar HomeWork.jar prg1 inputoutput. Ideally from within prg1, I want to sequentially launch 10 map- reducetasks. I want to store the output of all these map reduce tasks in some file.Currently I have kept the input format and output format of the jobs as TextInputFormat and TextOutputFormat respectively. Now I have the followingquestions. 1. When I run more than 1 task from the same program, the output file of all thetasks is same. The framework does not allows the 2 map reduce task to have thesame output file as task 1. 2. Before the 2 task launches I also get this error Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - alreadyinitialized 3. When the 2 map reduce tasks writes its output to file output, wont theprevious content of this file get over written ? Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) ___ 好ç�©è´ºå�¡cce dil;�ä½ å��ï¼�é�®ç ;®±è´ºå�¡å�¨æ�° ;ä¸�线ï¼� http://card.mail.cn.yahoo.com/
Re: Re: Re: Re: Help in Hadoop
I am still getting the same exception. This is the stack trace of it. java.io.IOException: Not a file: hdfs://zeus:18004/user/hadoop/output6/MatrixA-Row1 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:195) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142) at MatrixMultiplication.main(MatrixMultiplication.java:229) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Sun 11/22/09 9:28 PM , Jason Venner jason.had...@gmail.com sent: set the number of reduce tasks to 1. 2009/11/22 Hi everybody, The 10 different map-reducers store their respective outputs in 10 different files. This is the snap shot had...@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -ls output5 Found 2 items drwxr-xr-x - hadoop supergroup 0 2003-05-16 02:16 /user/hadoop/output5/MatrixA-Row1 drwxr-xr-x - hadoop supergroup 0 2003-05-16 02:16 /user/hadoop/output5/MatrixA-Row2 Now when I try to open any of these files I get an error message had...@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -cat output5/MatrixA-Row1 cat: Source must be a file. had...@zeus:~/hadoop-0.19.1$ But if I run had...@zeus:~/hadoop-0.19.1$ bin/hadoop dfs -cat output5/MatrixA-Row1/part-0 I get the correct output. I do not understand why I have to give this extra part-0. Now when I run a map reduce task to merge the outputs of all the files, I give the name of the directory output5 as the Input path. But I get a bug saying java.io.IOException: Not a file: hdfs://zeus:18004/user/hadoop/output5/MatrixA-Row1 I cannot understand how to make the frame work read my files. Alternatively I tried to avoid the map reduce approach for combining files and do it via a simple program, but I am unable to start. Can some one give me some sample implementation or something. Any help is appreciated Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Sun 11/22/09 5:48 PM , sent: Hellow, If I write the output of the 10 tasks in 10 different files then how do Igo about merging the output ? Is there some in built functionality or do I haveto write some code for that ? Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Sun 11/22/09 5:40 PM , Gang Luo lgpubli sent: Hi. If the output path already exists, it seems you could not execute any task with the same output path. I think you can output the results of the 10 tasks to 10 different paths, and then do sth more (by the 11th task, for example) to merge the 10 results into 1 file. Gang Luo - Department of Computer Science Duke University (919)316-0993 gang@du ke.edu [4] - å��åAcir c;§ï¿½Ã©ï¿½Âr eg;件 å��äA circ;»Â¶Ã¤ÂºÂºÃ ¯Â¼ï¿½ aa...@buffa lo.edu [5] lo.edu [6]�Ãcurre n;»¶äººAti lde;¯Â¼ï¿½ common-u...@hadoop.apache.orgÃ¥ï¿½ï ¿½Ã©ï¿½ï¿½ Atilde;¦ï¿½Â¥Ã¦ïiqu est;½ï¿½Ã¯Â¼ïiques t;½ 2009/11/22 (å�¨æïique st;½Â¥) 5:25:55 ä¸�åïiqu est;½ï¿½Ã¤Â¸Âraq uo; é¢�ïÂfrac 14;� Help in Hadoop Hello Everybody, I have a doubt in a map reduce program and I would appreciate any help. I run the program using the command bin/hadoop jar HomeWork.jar prg1 inputoutput. Ideally from within prg1, I want to sequentially launch 10 map- reducetasks. I want to store the output of all these map reduce tasks in some file.Currently I have kept the input format and output format of the jobs as TextInputFormat and TextOutputFormat respectively. Now I have the followingquestions. 1. When I run more than 1 task from the same program, the output file of all thetasks is same. The framework does not allows the 2 map reduce task to have thesame output file as task 1. 2. Before the 2 task launches I also get this error Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - alreadyinitialized 3. When the 2 map reduce tasks writes its output to file output, wont theprevious content of this file get over written ? Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) ___ 好ç�
Re: Re: Using hadoop for Matrix Multiplication in NFS?
Hi , I do not know if this will be helpful or not but I also wanted to use hadoop to do matrix multiplication. I came across a package called Hama which uses map reduce programs to multiply 2 matrices. To store the 2 matrices it used HBase. You could give that a shot. Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/13/09 12:06 PM , Brian Bockelman bbock...@cse.unl.edu sent: Hi, Assuming you're doing math... What you want is PETSc for sparse matrices: http://www.mcs.anl.gov/petsc/petsc-as/If you're doing dense matrices, probable scalapack: http://www.netlib.org/scalapack/ You benefit from working with someone who has a background in numerical analysis. Brian On Nov 14, 2009, at 12:42 AM, zjffdu wrote: See my comments -Original Message- From: Gimick [gimmick i...@gmail.com] Sent: 2009年11�12� 23:22 To: c ore-u...@hadoop.apache.org Subject: Using hadoop for Matrix Multiplication in NFS? Hi, I am new to hadoop. I am planning to do matrix multiplication (of order millions) using hadoop. I have a few queries regarding the above. i) Will using hadoop be a fix for this or should I try some other approaches? --- Hama maybe such a tool that fit for your requirement, http://incubator.apache.org/hama/ ii) I will be using it in NFS. Will using hadoop still be a good option? --- If you want to use NFS, I guess you have to provide your own InputFormat. So you'd better put your data into hdfs, it will make your work easy and improve your program's performance If I can use hadoop for this problem, could you plz send links to configure hadoop-site.xml file for a nfs system. P.S. I tried a few setup instructions via search, but everything seems to give Unable to connect to error. -- View this message in context: http://old.nabble.com/Using-hadoop-for-Matrix-Multiplication-in -NFS--tp26332 382p26332382.html Sent from the Hadoop core-user mailing list archive at Nabble.com.