Re: Matrix Multiplication using Hadoop

2012-03-15 Thread Naveen Mahale
Hi Shailesh,

Please check the implementation. Other than zeros, say there are proper
numbers in the input. Is it giving you the right output? Because long back
i had taken code from the link which you had mentioned. But for some reason
it was not working. Now i have written my own implementation for matrix
multiplication on hadoop.

Regards,
Naveen

On Thu, Mar 15, 2012 at 12:40 AM, Shailesh shailesh.shai...@gmail.comwrote:

 Hello,

 My question is posted in the link below:

 http://stackoverflow.com/q/9708427/1269809?sem=2

 Any help or feedback would be very helpful.

 Regards,
 Shailesh



Handling of small files in hadoop

2011-09-14 Thread Naveen Mahale
Hi all,

I use hadoop-0.21.0 distribution. I have a large number of small files (KB).
Is there any efficient way of handling it in hadoop?

I have heard that solution for that problem is using:
1. HAR (hadoop archives)
2. cat on files

I would like to know if there are any other solutions for processing large
number of small files.

Regards,
Naveen Mahale


Re: Handling of small files in hadoop

2011-09-14 Thread Naveen Mahale
Hey, thanks Joey for that information. Would work on what you said.

Regards
Naveen Mahale

On Wed, Sep 14, 2011 at 5:32 PM, Joey Echeverria j...@cloudera.com wrote:

 Hi Naveen,

  I use hadoop-0.21.0 distribution. I have a large number of small files
 (KB).

 Word of warning, 0.21 is not a stable release. The recommended version
 is in the 0.20.x range.

  Is there any efficient way of handling it in hadoop?
 
  I have heard that solution for that problem is using:
 1. HAR (hadoop archives)
 2. cat on files
 
  I would like to know if there are any other solutions for processing
 large
  number of small files.

 You could also stick each file as a record in a sequence file. The
 name of the file becomes the key, the bytes of the file the value.
 That gives you compression and splitability, but not random access.
 You already noted HAR, which does give you random access.

 -Joey



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434