Re: Riak Search - searching across all fields

Matt Painter Fri, 14 Dec 2012 21:43:52 -0800

Thanks so much Ryan - yokozuna sounds most promising. If I were building a
small system (relatively simple, small user base) that will be
production-ready in a few months, do you think that Yokozuna could cut the
mustard? I see that it's officially an experimental prototype, but do you
think it's stable 'enough' in its current state? Sorry if this is an
impossible question to answer with too many variables...


I must confess that using a forked Riak makes me a touch queasy for
anything other than playpen stuff. Do you think that a combo deal of Riak +
elasticsearch could be a suitable compromise for the time-being?


On 15 December 2012 06:13, Ryan Zezeski <rzeze...@basho.com> wrote:

> Matt, comments inline
>
>
> On Tue, Dec 11, 2012 at 3:35 AM, Matt Painter <m...@deity.co.nz> wrote:
>>
>>
>> Apart from a single default value, is it possible for Riak Search to
>> search for a keyword across all fields in a document without having to
>> specify the field up front as a prefix in one's search term?
>>
>
> A field must be specified to search, but a default field may be specified
> in the schema [1].  This field will be searched if one is not specified.
>  But there is no way to do a search against all fields.  It is always over
> one field.
>
>
>> I'm guessing that one solution could be a post-commit hook which
>> recursively iterates over all fields and squashes them into a secondary
>> default value field - but since I know even less about Erlang and am just
>> starting out with Riak, I thought it prudent to see if there was a more
>> straightforward solution...
>>
>
> Your use case immediately makes me think of Solr copy fields.  You index
> everything under their individual fields but all values get copied into a
> catch-all field so that all content may be searched easily.  However, with
> this you lose the ability to know which field it came from.  Riak Search
> doesn't have copy field functionality.  You'd have to concatenate all the
> data into a field on your application side.  The new search solution I've
> been working on, Yokozuna, uses Solr underneath and therefore does support
> copy fields [2].
>
> You could create a pre-commit hook to do this field-squashing but I think
> you would be better off doing it in your application.  To do it via a hook
> you'd have to make sure it runs before the search hook (I can't remember if
> you can force specific order of pre-commit hooks).  It would also have an
> effect on your write latencies as more pre-processing would have to be
> done.  Finally, you would have to write Erlang.
>
>
>> The use case is this:
>>
>> We are providing an object + metadata store for users to deposit files
>> and any number of related fragments of structured JSON metadata. We are not
>> enforcing any metadata schema - and therefore can't know up-front any field
>> names - but would like the ability for a dumb keyword search from a website
>> to return references to the records they have deposited in
>> Riak. Essentially, providing a Google-like interface.
>>
>> (As a side question, Is Riak Search mature enough for these type of very
>> generic searches? I know that it's "inspired by" Lucene and "Lucene-like",
>> but I don't know how many of Lucene's goodies are present - or is it just a
>> case of invoking analysers provided by Lucene for things like stemming, and
>> all will be pretty much equivalent for most situations?)
>>
>
> There are no "goodies present" _at all_.  Riak Search is an in-house
> implementation, completely written in Erlang.  It's only connection to
> Lucene/Solr is a superficial interface that looks very much like
> Lucene/Solr.  E.g. you mention stemming, there is no stemming support in
> Riak Search and would be a non-trivial addition.  This is one of the big
> reasons Yokozuna is being written [2].  The world of search is vast and
> complicated, best to start with proven solution and build from that.
>
> Riak Search generally starts causing pain when you have searches that
> match tens of thousands of documents.  The runtime is proportional to the
> size of the result set.  In fact, Riak Search has a hard-coded upper limit
> to fail queries that match 100K or more documents (although it does the
> work to get 100K results and then drops it all on the floor so you still
> use resources/times).  For example, if a lot of your files were pictures
> and were tagged with something like {"type":"picture"} then a search for
> "picture" is probably going to cause issues.  Things really start to hurt
> when you do conjunction queries with multiple large result sets, e.g.
> "funny AND picture".  Once again, this is not the case with Yokozuna, which
> in my benchmarking thus far has shown flat latencies regardless of result
> set size.
>
> -Z
>
> [1]:
> http://docs.basho.com/riak/latest/cookbooks/Riak-Search---Schema/#Defining-a-Schema
>
> [2]: https://github.com/rzezeski/yokozuna
>



-- 
Matt Painter
m...@deity.co.nz
+64 21 115 9378

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak Search - searching across all fields

Reply via email to