Wikipedia Ingest needs more parallelism
---------------------------------------
Key: ACCUMULO-375
URL: https://issues.apache.org/jira/browse/ACCUMULO-375
Project: Accumulo
Issue Type: Improvement
Reporter: Adam Fuchs
The wikipedia ingest Map job uses a derivative of the FileInputFormat, which
launches one job per file. Given the partitioning strategy and workload
distribution, it makes sense to launch multiple mappers per file. Each mapper
can then take a chunk of the articles in the file using the same partitioning
strategy as the assignment of row IDs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira