On Thu, Dec 11, 2008 at 2:36 PM, Edward J. Yoon <[EMAIL PROTECTED]>wrote:

> If we remove 'reduce phase', I guess we can reduce the disk I/O operations.


Yes.


>
>
> In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> Constants.COLUMN }, and write directly blocks.


Two methods to be considered:
1) We need a InputFormat that partitions the matrix table according to the
row boundaries of the blocks.
    This should be carefully to make sure a single block will not divied
into two or more mappers.

2) Like what RandomMatrixMap does, we just tell the mappers the row/column
boundaries of the blocks of a matrix-table.
    Scanner the portion of the table will be done in a mapper.

I think 1) may be better than 2).
An InputFormat can get the locality of a range of table to let MR know how
to move the mr computations close to it.
In 2), if we do it like RandomMatrixMap, we may lose some locality
informations of the table. so that the network transfer overhead may be
increase.

It is just my guess and thoughts.


>
>
> What do you think?
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> [EMAIL PROTECTED]
> http://blog.udanax.org
>

Reply via email to