What is meant by lexical search? Lucene style? http://www.lucenetutorial.com/lucene-query-syntax.html
If so, these searches could be prioritized (not all are particularly useful), and it shouldn't be too hard to come up with recommended Accumulo approaches for the most important lexical searches. On Jul 24, 2014, at 10:44 AM, Donald Miner <dmi...@clearedgeit.com> wrote: > One problem I ran into when thinking about this problem is throughput. In > accumulo, we talk about tens or hundreds of thousands or millions of > records per second. A lot of these search solutions talk about hundreds or > thousands of documents per second. > > This problem that Accumulo is able to outpace just about anything lead me > to think that some sort of microbatch solution might be the best choice. If > you wait for your data to be indexed before moving on to the next Accumulo > insert you can start lagging behind. Basically, you are crippling your > ingest throughput by making it the slower of the two systems. > > It seems like a more microbatch (or batch) approach might be worthwhile-- > what you are trading is your text index lagging behind, but you keep your > ingest throughput in Accumulo. I think Apache Blur does batch parallel > indexing, which is why I was looking at it for this. > > > On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <rosh...@gmail.com> wrote: > >> Yeah I think David's solution is the best. Though I like the idea of having >> a server side Constraint or hook that puts the updates into the queue. >> >> The Cassandra work I had seen actually tightly couples a Cassandra node to >> a Solr shard. So all the data that exists on that specific node also exists >> on that specific Solr shard. Would be pretty cool to do the same thing with >> a tablet server => local Solr shard. >> >> >> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets <david.medin...@gmail.com> >> wrote: >> >>> Ingest to a queue. Have two processes subscribe to the queue. One >>> pushing into Accumulo and the other pushing into SolrCloud. Why >>> tightly couple the capabilities? >>> >>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <rosh...@gmail.com> >>> wrote: >>>> Is there a way to tie into the write process in Accumulo? Maybe just >> use >>> an >>>> Iterator that worked on compaction to send data to blur/solr? I have >> seen >>>> something similar in Cassandra, a data hook to save data in Solr. >>>> >>>> >>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <nehal...@gmail.com> >> wrote: >>>> >>>>> We were trying to do so, but adding visibility while adding/searching >>>>> documents needs lot more thinking. Adding visibility to core search >>> engine >>>>> needs changes to algorithm and that does not make it very scalable. >>>>> Integration besides granular visibility is very doable. and we had >> taken >>>>> inspiration from Solandra. >>>>> >>>>> Obviously if we can get it done it adds lot of value. I believe Sqrrl >>>>> people have already done it, are they thinking to open source it >>> anytime in >>>>> future? >>>>> >>>>> >>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dmi...@clearedgeit.com >>> >>>>> wrote: >>>>> >>>>>> We briefly toyed with blur on accumulo but didnt get too far just >>> because >>>>>> it was obe. I think that would be cool. >>>>>> >>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <josh.el...@gmail.com> >>> wrote: >>>>>>> >>>>>>> It's definitely possible. I remember hearing about someone doing >>> lucene >>>>>> on top of Accumulo once, but I don't recall seeing a nice package >>> with a >>>>>> bow on top. >>>>>>> >>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote: >>>>>>>> What lexical search package (like lucene/solr) has anyone put on >>> top >>>>> of >>>>>> accumulo? Is this possible or does everyone just index log files >> and >>>>>> documents? >>>>>>>> >>>>>>>> v/r >>>>>>>> Bob Thorman >>>>>>>> Principal Big Data Engineer >>>>>>>> AT&T Big Data CoE >>>>>>>> 2900 W. Plano Parkway >>>>>>>> Plano, TX 75075 >>>>>>>> 972-658-1714 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>> >>> >> > > > > -- > > Donald Miner > Chief Technology Officer > ClearEdge IT Solutions, LLC > Cell: 443 799 7807 > www.clearedgeit.com
smime.p7s
Description: S/MIME cryptographic signature