To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
-------------------------------------------------------------------------
Key: HAMA-133
URL: https://issues.apache.org/jira/browse/HAMA-133
Project: Hama
Issue Type: Sub-task
Components: implementation
Affects Versions: 0.1.0
Reporter: Edward J. Yoon
Fix For: 0.1.0
> If we remove 'reduce phase', I guess we can reduce the disk I/O operations.
Yes.
>
>
> In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> Constants.COLUMN }, and write directly blocks.
Two methods to be considered:
1) We need a InputFormat that partitions the matrix table according to the
row boundaries of the blocks.
This should be carefully to make sure a single block will not divied
into two or more mappers.
2) Like what RandomMatrixMap does, we just tell the mappers the row/column
boundaries of the blocks of a matrix-table.
Scanner the portion of the table will be done in a mapper.
I think 1) may be better than 2).
An InputFormat can get the locality of a range of table to let MR know how
to move the mr computations close to it.
In 2), if we do it like RandomMatrixMap, we may lose some locality
informations of the table. so that the network transfer overhead may be
increase.
It is just my guess and thoughts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.