Re: [Ferret-talk] Road map of ferret

Erik Hatcher Thu, 28 Aug 2008 11:07:54 -0700


On Aug 28, 2008, at 1:02 PM, Jens Kraemer wrote:

What advantage does Ferret have in terms of ActiveRecordintegration that Solr wouldn't have?
If you're talking about custom analyzers being in Ruby, more onthat below.
It's not only custom analyzers, but the fact that acts_as_ferret'sDRb runs with the full Rails application loaded, so i.e. to bulkindex a number of records aaf just hands the server the ids andclass name of the records to index, and the server does the rest.

Gotcha. Meaning the search server is pulling from the DB directly.That's what the DataImportHandler in Solr does as well. It'd be asimple single HTTP request to Solr (once the DB stuff is configured,of course) to have it do full or incremental DB indexing.

How to use a custom analyzer with solr? You have to code it inJava (or you do your analysis before feeding the data into javaland, which I wouldn't consider good app design).
Most users would not need to write a custom analyzer. Many of thebuilt-in ones are quite configurable. Yes, Solr does requireschema configuration via an XML file, but there have beenacts_as_solr variants (good and bad thing about this git craze)that generate that for you automatically from an AR model.
Glad you mentioned this ;) I don't want to configure an analyzer viaxml when I can throw my own together with 4 or 5 lines of easy toread ruby code. Same for index structure. Philosophical mismatchbetween the Java and Ruby worlds I think :)

Don't get me wrong... I'm a Ruby fanatic myself! XML makes me ill,generally speaking (it has its uses, but for configuration it is justplain wrong).

For using the built-in tokenizer/filters, a smarter acts_as_solr couldgenerate the right config based on a model specifying parameters foranalysis.

But even if you do that then you have
a) half a java project (I don't want that)
That's totally fair, and really the primary compelling reason for aFerret over Solr for pure Ruby/Rails projects. I dig that.
But isn't Ferret is like 60k lines of C code too?!
true, but I don't have to compile that every time I deploy my app...

My point was that Ferret isn't just Ruby, just a counter point to your"half a java project". No one has to recompile Solr either.

and b) no way to use your existing rails classes in that customanalyzer (I *have* analyzers using rails models to retrievesynonyms and narrower terms for thesaurus based query expansion)
You could leverage client-side query expansion with Solr... justtake the users query, massage it, and send whatever query you liketo Solr. Solr also has synonym and stop word capability too.
yeah, I could do that. But that's moving analysis stuff into myapplication, which is quite contrary to the purpose of analyzers -encapsulate this logic and make it pluggable into the search enginelibrary. So less style points for this solution...

I was just saying :) It's debatable exactly where in the client-server spectrum synonym expansion belongs... and it really depends onthe needs of the project. Nothing wrong with a client doing some userinput massaging before a query hits the search server.

However, there is also no reason (and I have this on my copious-free-time-TOOD-list) that JRuby couldn't be used behind the scenesof a Solr analyzer/tokenizer/filter or even request handler... anddo all the cool Ruby stuff you like right there. Heck, you couldeven send the Ruby code over to Solr to execute there if you like ;)
that sounds sexy ;)

Should be fairly trivial to wire JRuby in. The DataImportHandleralready has scripting language support for data transformation:<http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9> (shield your eyes from the XML wrapping it!), so I believe JRubyshould already work in that context. This is sort of like the Mapperstuff I built into solr-ruby, transforming data from domain to searchengine "documents".

Here's what I would do *if* I experienced severe problems withFerret in any of my projects:
Take aaf, replace Ferret with Lucene or even make it modular todecide at run time which one to use, run the DRb server (or thewhole app, that depends) under JRuby and call it acts_as_lucene :-)Et voila - great Rails integration plus Lucene's maturity. But aslong as Ferret's working fine for me that's really unlikely tohappen... Unless somebody wants to sponsor that project, ofcourse ;)
Just using Solr and fixing up acts_as_solr to meet your needs (ifit doesn't) would be even easier than all that :) Solr really is abetter starting point than Lucene directly, for caching,scalability, replication, faceting, etc.
Depends on whether you need these features or not. From myexperience, lots of projects don't need these things anyway, becausethey're running on a single host and nearly every other part of theapplication is slower than search... Maybe it's because I'm quiteinvolved with the topic and am familiar with lucene's API, but to meSolr looks like an additional layer of abstraction and complexitywhich I only want to have when it really gives me a feature I need.Plus the last time I checked Lucene didn't need xml configurationfiles ;)

I hear ya about the XML config files. And always to be fair to Solrhere, you really only need to set things up from a basic exampleconfiguration that covers most scenarios already - so it really isn'tnecessary to even touch XML config except for tweaking little things.

But Solr's advantages over just Lucene are built out of experiencesthat most Lucene projects eventually build anyway. Caching - reallyimportant for faceting, which is a need that every project I touchthese days needs. Replication - really really important forscalability of massive querying load. It's really not such a bigchunk over Lucene to bite off... and in almost all respects it is evensimpler to use Solr than Lucene anyway.

In development environments and especially when it comes toautomated tests / CI it's also quite comfortable not having to run aseparate server but using the short cut directly to the index, whichisn't possible with Solr.

Not true. Solr can work embedded. There is a base SolrServerabstraction, with an implementation that runs embedded (inside thesame JVM) versus over HTTP. Exactly the same interface for bothoperations, using a very simple API (SolrJ, much like Lucene's basicAPI actually).

I'd be curious to see scalability comparisons between Ferret andSolr - or perhaps more properly between Stellr and Solr - as itboils down to number of documents, queries per second, and facetingand highlighting speed. I'm betting on Solr myself (by being sointo it and basing my professional life on it).
This would be interesting, but I wouldn't be that disappointed withStellr ending up second given the little amount of time I've spentbuilding it so far. Just out of curiosity, do you have some kind ofperformance testing suite for Solr which I could throw at Stellr?

No, I don't have those kinds of tests myself. While I can speak toSolr's performance based on what I hear from our clients and thereports in the mailing lists, I don't consider myself a performancesavvy person myself.

I'm curious - what are the numbers of documents being put into Ferretindexes out there? millions? hundreds of millions? billions? Andare folks doing faceting? Does Ferret have faceting support?


        Erik

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] Road map of ferret

Reply via email to