Hi Peter,
Thanks for the quick reply!
On 23 Feb 2011, at 4:49 PM, Peter Karman wrote:
> Hi Andrew,
>
> Andrew S. Townley wrote on 02/23/2011 09:51 AM:
>
>>
>> 1) Can lucy store multi-field "documents" a la Ferret?
>
> Yes.
Great.
>>
>> 2) Can lucy give me the match result information I'm looking for within each
>> document as part of the search hit information?
>>
>
> For highlighting and snippet extraction? yes.
Well, actually, I want it for more than that. For my particular needs, I need
to get the field name where the match occurred in the document, and then I'd
ideally like to have the start offset into that field and the length of the
match.
This is the core information I can't get right now from Ferret. For example
(nevermind about the accuracy of the information here ;):
valkyrie$ irb
>> require 'ferret'
=> true
>> include Ferret
=> Object
>> index = Index::Index.new
=> #<Ferret::Index::Index:0x101342108 @default_input_field=:id,
@mon_waiting_queue=[], @qp=nil, @default_field=:*, @key=nil, @auto_flush=false,
@mon_entering_queue=[], @open=true,
@dir=#<Ferret::Store::RAMDirectory:0x101342068>, @mon_count=0, @id_field=:id,
@reader=nil, @searcher=nil, @close_dir=true, @mon_owner=nil, @writer=nil,
@options={:dir=>#<Ferret::Store::RAMDirectory:0x101342068>,
:analyzer=>#<Ferret::Analysis::StandardAnalyzer:0x101341e60>,
:lock_retry_time=>2, :default_field=>:*}>
>> index << {:title => "Fred flinstone", :description => "The cartoon series" }
=> nil
>> index << {:title => "The Flinstones", :description => "Fred flinstone's
>> family" }
=> nil
>> index.search("flinstones")
=> #<struct Ferret::Search::TopDocs total_hits=1, hits=[#<struct
Ferret::Search::Hit doc=1, score=0.254271149635315>],
max_score=0.254271149635315, searcher=#<Ferret::Search::Searcher:0x101314e60>>
The Ferret::Search::Hit gives me the document number and the score, but that's
it. In whatever list format the results are actually in, I'd also like to have
the information I mentioned. If you weren't storing the offset information,
then it would make sense for it not to be available, but if you were, then I'd
expect to have the whole thing right there. I can't see how there'd be a
performance issue in providing this information.
I just want to make sure we're on the same page, as this is a critical feature
for what I'm trying to do.
>
>> 3) How would you relate the completeness/stability of the core C library and
>> Ruby bindings?
>>
>
> Alas, here's the rub. There are no Ruby bindings at present. The core C
> code is stable and "complete" (for some value of "complete" -- i.e. it
> works). But to date there are only Perl bindings.
>
> I posted about this on the Ferret list awhile back, inviting Ruby
> developers to come have a look and help jump-start the Ruby
> implemenation. I realize your project has some immediate needs; please
> also consider hanging around and helping us define the Ruby
> implementation. Subscribe to lucy-dev to get started.
Thanks for the information. Unfortunate. Thanks for the offer to help out
though. It might be a while before I have any bandwidth, but depending on how
things go, lucy might be the best long-term solution.
In digging around Google in the interim between now and my original note, I
re-read the charter for Lucy. One of the things that struck me was the
"implementing as much functionality in high-level languages as possible"
comment. What does this mean, exactly?
Part of the reason I ask has to do with the future of my own project. Much of
what I have now will eventually be rewritten piecemeal in C++ and then wrapped
via SWIG so I can have Ruby and Java bindings as well as use it in other
environments natively supporting C/C++. Whatever route I end up going for
fulltext, this is something that would need to support the same kind of thing
as I'd actually be leveraging it more from the C++ code than the Ruby code.
With the way the statement above is phrased, it seems like this wouldn't really
be possible. It also seems like there might be an awful lot of duplication of
effort involved in actually creating each language binding. Why was this
approach chosen rather than put all the muscle in the C code and provide thin
wrappers--even via SWIG or something more hand-tailored where
necessary/appropriate?
I tried to dig through the lucy SVN repository via the web UI, but I couldn't
really figure out what's there. The code generator framework you're using is
something I haven't seen before, but at least it explains why I couldn't find
the Ruby bindings! :)
Anyway, thanks for the answers. Presently tinkering with the Ferret internals
since it seems like there ought to be a way to expose what I want (it's in the
explain output), but there's a lot of code, and I'm certainly no search engine
expert!
Cheers,
ast
--
Andrew S. Townley <[email protected]>
http://atownley.org