full system application test scenario: webtable -----------------------------------------------
Key: HBASE-2602 URL: https://issues.apache.org/jira/browse/HBASE-2602 Project: Hadoop HBase Issue Type: Sub-task Reporter: Andrew Purtell Put together a full web content archive toolchain: - Pluggable content acquisition - Heritrix HBase writer simulator, zipf distribution over [32B, 10MB] - Real Heritrix + Heritrix HBase writer option for running crawls over live sites - Periodic MapReduce using Tika for extraction - Katta for search, or hbasene + Solr when ready (better) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.