Hi Aaron, question regarding the Blur data ingestion focus:
Do I read it correctly that Blur is not near-real time system (like both ES and Solr are)? For example would Blur be a valid candidate for aggregated logging use case? How long does it usually take for indexed data to become searchable (ms, sec, mins)? As for retention of the data what are the strategies to drop old data from the index? For example is there anything like dropping old data based on index name patters (given the index name contains timestamp)? thanks, Lukáš > - Massive data ingestion > > Basically the focus on ingestion was not on latency but rather having the > ability to incrementally add large amounts of data to the index that is > likely also very large on it's own. The project uses Yarn MR for this and > it is not a quick way to bring data but if your needs are to index large > chunks of data incrementally it works very well. Also if a full reindex > was needed this could done easily as well. Something to point out here is > that the MR indexing puts very little strain on the running system to > perform the updates/reindexes I believe this differs from how ES and Solr > are implemented.
