[ https://issues.apache.org/jira/browse/HBASE-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell resolved HBASE-2602. ----------------------------------- Resolution: Later > full system application test scenario: webtable > ----------------------------------------------- > > Key: HBASE-2602 > URL: https://issues.apache.org/jira/browse/HBASE-2602 > Project: HBase > Issue Type: Sub-task > Reporter: Andrew Purtell > > Put together a full web content archive toolchain: > * Pluggable content acquisition > ** Heritrix HBase writer simulator, zipf distribution over [32B, 10MB] > ** Real Heritrix + Heritrix HBase writer option for running crawls over > live sites > * Periodic MapReduce using Tika for extraction > * Katta for search, or hbasene + Solr when ready (better) -- This message was sent by Atlassian JIRA (v6.2#6252)