Wikipedia Ingest needs more parallelism
---------------------------------------

                 Key: ACCUMULO-375
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-375
             Project: Accumulo
          Issue Type: Improvement
            Reporter: Adam Fuchs


The wikipedia ingest Map job uses a derivative of the FileInputFormat, which 
launches one job per file. Given the partitioning strategy and workload 
distribution, it makes sense to launch multiple mappers per file. Each mapper 
can then take a chunk of the articles in the file using the same partitioning 
strategy as the assignment of row IDs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to