On Sun, Dec 8, 2013 at 10:32 PM, Otis Gospodnetic < [email protected]> wrote:
> Thanks for the info about other distributed FSs being an option. I'd guess > relying on the distributed FS is nice for any very large deployment, but I > wonder if that requirement is hinderance for any small to medium sized > deployment that needs more than 1 shard server, but doesn't quite want the > whole dist FS machinery. > > What's your experience? > I don't see running the HDFS part of Hadoop very hard to do, MapReduce might be overkill for some people though. > > Distributed trace sounds nice and useful! Is it exposed via JMX or some > other API? I'd want us to capture that with SPM once we add support for > Blur monitoring to SPM. > All the trace information is available through the standard Thrift API in Blur. And there's a pluggable API for how the traces are stored, current implementations are in ZooKeeper and HDFS, as well as just logging the info. Aaron > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Sun, Dec 8, 2013 at 10:10 AM, Aaron McCurry <[email protected]> wrote: > > > On Sun, Dec 8, 2013 at 9:57 AM, Otis Gospodnetic < > > [email protected] > > > wrote: > > > > > Thanks Aaron for this info. This sounds very similar to both > > Solr/ES..... > > > from this description I can't really see any significant difference. > > > Perhaps the main difference is that with Solr/ES Hadoop/HDFS/MapReduce > > is > > > something that's optional and that most people do not (need to) use, > > while > > > Hadoop/HDFS/MapReduce are an integral part of Blur's offering and you > > can't > > > have Blur without them. > > > > > > > While I haven't ever run Blur without HDFS. Technically you could run > any > > distributed file system with Blur, but a distributed FS is required if > you > > want to go beyond 1 shard server. > > > > MapReduce is not required, only a distributed FS and ZooKeeper. > > > > > > > > > > What is distributed tracing? I can't map that to anything in Solr/ES. > > > > > > > It allows the client to start a trace of the request(s) they make. It > > propagates through the entire stack gathering timing around all the > > traceable sections of code. It also traverses threads and network calls. > > It helps to explain where the time goes for a given request. There is > > also a display for the trace built into the status pages of Blur. > > > > Aaron > > > > > > > > > > Thanks, > > > Otis > > > -- > > > Performance Monitoring * Log Analytics * Search Analytics > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <[email protected]> > > wrote: > > > > > > > Hi James, > > > > > > > > Thanks for your interest and questions, I will attempt to answer your > > > > questions below. > > > > > > > > > > > > On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <[email protected]> > > > > wrote: > > > > > > > > > Hi Aaron, I'm wondering if you can talk a little about how you Blur > > > > > differentiating itself from ElasticSearch and Solr. It seems like > > both > > > of > > > > > them, in particular Solr after picking up some Blur code, are > gaining > > > > more > > > > > abilities to interact with hadoop and HDFS. > > > > > > > > > > > > > Unfortunately I'm not an expert in Solr or ElasticSearch. I tell you > > > that > > > > Blur's high level features when talking about how it's interacts with > > > > Hadoop. > > > > > > > > - Index storage (The obvious one) > > > > - Bulk offline indexing, with incremental updates. > > > > This one gives you the ability to perform indexing on a dedicated > > > MapReduce > > > > cluster and simply move the index updates to the running Blur cluster > > for > > > > importing. > > > > - WAL (write ahead log) is written to use HDFS > > > > - Also we are currently moving most of the meta data from ZooKeeper > > > storage > > > > to HDFS storage. This makes interacting with the meta data of a > table > > > easy > > > > to do form within MapReduce jobs > > > > > > > > > > > > > > > > > How does a blur install differ from a solr setup reading off hdfs? > > > > > > > > > > > > > Again I'm not an expert in Solr. Blur's setup runs a cluster of > shard > > > > servers that serve shards (indexes) of the table within that shard > > > cluster. > > > > The indexes are stored once in HDFS (not counting the HDFS > replication > > > > here) and evenly distributed across whatever shard servers are > online. > > > > Blur utilizes a BlockCache (think file system cache) that is an > > off-heap > > > > based system. The first version of this was originally picked up by > > > > Cloudera and modified (I'm assuming) and committed back into the > > > > Lucene/Solr code base. The second version of this block cache (Blur > > > 0.2.2 > > > > stable) is now the default in Blur. It has several advantages of the > > > first > > > > version: > > > > > > > > > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201310.mbox/%3CCAB6tTr0Nr2aDLc4kkHoeqiO-utwzBAhb=Ru==gmhqry4axp...@mail.gmail.com%3E > > > > > > > > One interesting feature of Blur is the ability to run a cluster of > > > > controllers (controllers are used to make the shard cluster look > like a > > > > single service) in front multiple shard clusters. This can help to > > deal > > > > with reindexes of data, meaning that you can reindex all your index > to > > a > > > > new cluster and not effect performance of the cluster that your users > > may > > > > be interacting with. > > > > > > > > > > > > Some of the overall features of Blur are: > > > > - NRT updates of data > > > > - Offline bulk indexing > > > > - Block cache for fast query performance > > > > - Index warmup (pulls parts of the index up into block cache when a > > > segment > > > > is brought online) > > > > - Performance metrics gathering > > > > - Distributed tracing > > > > - Custom index types > > > > - Custom server side logic can be implemented (basic) > > > > > > > > I'm sure there are many more. > > > > > > > > Hope this helps. > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > thanks > > > > > > > > > > James > > > > > > > > > > > > > > >
