[ 
https://issues.apache.org/jira/browse/ACCUMULO-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Fuchs reassigned ACCUMULO-375:
-----------------------------------

    Assignee: Adam Fuchs
    
> Wikipedia Ingest needs more parallelism
> ---------------------------------------
>
>                 Key: ACCUMULO-375
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-375
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>            Assignee: Adam Fuchs
>
> The wikipedia ingest Map job uses a derivative of the FileInputFormat, which 
> launches one job per file. Given the partitioning strategy and workload 
> distribution, it makes sense to launch multiple mappers per file. Each mapper 
> can then take a chunk of the articles in the file using the same partitioning 
> strategy as the assignment of row IDs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to