Thanks Aaron for this info. This sounds very similar to both Solr/ES..... from this description I can't really see any significant difference. Perhaps the main difference is that with Solr/ES Hadoop/HDFS/MapReduce is something that's optional and that most people do not (need to) use, while Hadoop/HDFS/MapReduce are an integral part of Blur's offering and you can't have Blur without them.
What is distributed tracing? I can't map that to anything in Solr/ES. Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <[email protected]> wrote: > Hi James, > > Thanks for your interest and questions, I will attempt to answer your > questions below. > > > On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <[email protected]> > wrote: > > > Hi Aaron, I'm wondering if you can talk a little about how you Blur > > differentiating itself from ElasticSearch and Solr. It seems like both of > > them, in particular Solr after picking up some Blur code, are gaining > more > > abilities to interact with hadoop and HDFS. > > > > Unfortunately I'm not an expert in Solr or ElasticSearch. I tell you that > Blur's high level features when talking about how it's interacts with > Hadoop. > > - Index storage (The obvious one) > - Bulk offline indexing, with incremental updates. > This one gives you the ability to perform indexing on a dedicated MapReduce > cluster and simply move the index updates to the running Blur cluster for > importing. > - WAL (write ahead log) is written to use HDFS > - Also we are currently moving most of the meta data from ZooKeeper storage > to HDFS storage. This makes interacting with the meta data of a table easy > to do form within MapReduce jobs > > > > > How does a blur install differ from a solr setup reading off hdfs? > > > > Again I'm not an expert in Solr. Blur's setup runs a cluster of shard > servers that serve shards (indexes) of the table within that shard cluster. > The indexes are stored once in HDFS (not counting the HDFS replication > here) and evenly distributed across whatever shard servers are online. > Blur utilizes a BlockCache (think file system cache) that is an off-heap > based system. The first version of this was originally picked up by > Cloudera and modified (I'm assuming) and committed back into the > Lucene/Solr code base. The second version of this block cache (Blur 0.2.2 > stable) is now the default in Blur. It has several advantages of the first > version: > > > http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201310.mbox/%3CCAB6tTr0Nr2aDLc4kkHoeqiO-utwzBAhb=Ru==gmhqry4axp...@mail.gmail.com%3E > > One interesting feature of Blur is the ability to run a cluster of > controllers (controllers are used to make the shard cluster look like a > single service) in front multiple shard clusters. This can help to deal > with reindexes of data, meaning that you can reindex all your index to a > new cluster and not effect performance of the cluster that your users may > be interacting with. > > > Some of the overall features of Blur are: > - NRT updates of data > - Offline bulk indexing > - Block cache for fast query performance > - Index warmup (pulls parts of the index up into block cache when a segment > is brought online) > - Performance metrics gathering > - Distributed tracing > - Custom index types > - Custom server side logic can be implemented (basic) > > I'm sure there are many more. > > Hope this helps. > > Aaron > > > > > > > thanks > > > > James > > >
