Re: Matrix Multiplication using Hadoop
Hi Shailesh, Please check the implementation. Other than zeros, say there are proper numbers in the input. Is it giving you the right output? Because long back i had taken code from the link which you had mentioned. But for some reason it was not working. Now i have written my own implementation for matrix multiplication on hadoop. Regards, Naveen On Thu, Mar 15, 2012 at 12:40 AM, Shailesh shailesh.shai...@gmail.comwrote: Hello, My question is posted in the link below: http://stackoverflow.com/q/9708427/1269809?sem=2 Any help or feedback would be very helpful. Regards, Shailesh
Handling of small files in hadoop
Hi all, I use hadoop-0.21.0 distribution. I have a large number of small files (KB). Is there any efficient way of handling it in hadoop? I have heard that solution for that problem is using: 1. HAR (hadoop archives) 2. cat on files I would like to know if there are any other solutions for processing large number of small files. Regards, Naveen Mahale
Re: Handling of small files in hadoop
Hey, thanks Joey for that information. Would work on what you said. Regards Naveen Mahale On Wed, Sep 14, 2011 at 5:32 PM, Joey Echeverria j...@cloudera.com wrote: Hi Naveen, I use hadoop-0.21.0 distribution. I have a large number of small files (KB). Word of warning, 0.21 is not a stable release. The recommended version is in the 0.20.x range. Is there any efficient way of handling it in hadoop? I have heard that solution for that problem is using: 1. HAR (hadoop archives) 2. cat on files I would like to know if there are any other solutions for processing large number of small files. You could also stick each file as a record in a sequence file. The name of the file becomes the key, the bytes of the file the value. That gives you compression and splitability, but not random access. You already noted HAR, which does give you random access. -Joey -- Joseph Echeverria Cloudera, Inc. 443.305.9434