On Aug 28, 2008, at 1:02 PM, Jens Kraemer wrote:
What advantage does Ferret have in terms of ActiveRecord
integration that Solr wouldn't have?
If you're talking about custom analyzers being in Ruby, more on
that below.
It's not only custom analyzers, but the fact that acts_as_ferret's
DRb runs with the full Rails application loaded, so i.e. to bulk
index a number of records aaf just hands the server the ids and
class name of the records to index, and the server does the rest.
Gotcha. Meaning the search server is pulling from the DB directly.
That's what the DataImportHandler in Solr does as well. It'd be a
simple single HTTP request to Solr (once the DB stuff is configured,
of course) to have it do full or incremental DB indexing.
How to use a custom analyzer with solr? You have to code it in
Java (or you do your analysis before feeding the data into java
land, which I wouldn't consider good app design).
Most users would not need to write a custom analyzer. Many of the
built-in ones are quite configurable. Yes, Solr does require
schema configuration via an XML file, but there have been
acts_as_solr variants (good and bad thing about this git craze)
that generate that for you automatically from an AR model.
Glad you mentioned this ;) I don't want to configure an analyzer via
xml when I can throw my own together with 4 or 5 lines of easy to
read ruby code. Same for index structure. Philosophical mismatch
between the Java and Ruby worlds I think :)
Don't get me wrong... I'm a Ruby fanatic myself! XML makes me ill,
generally speaking (it has its uses, but for configuration it is just
plain wrong).
For using the built-in tokenizer/filters, a smarter acts_as_solr could
generate the right config based on a model specifying parameters for
analysis.
But even if you do that then you have
a) half a java project (I don't want that)
That's totally fair, and really the primary compelling reason for a
Ferret over Solr for pure Ruby/Rails projects. I dig that.
But isn't Ferret is like 60k lines of C code too?!
true, but I don't have to compile that every time I deploy my app...
My point was that Ferret isn't just Ruby, just a counter point to your
"half a java project". No one has to recompile Solr either.
and b) no way to use your existing rails classes in that custom
analyzer (I *have* analyzers using rails models to retrieve
synonyms and narrower terms for thesaurus based query expansion)
You could leverage client-side query expansion with Solr... just
take the users query, massage it, and send whatever query you like
to Solr. Solr also has synonym and stop word capability too.
yeah, I could do that. But that's moving analysis stuff into my
application, which is quite contrary to the purpose of analyzers -
encapsulate this logic and make it pluggable into the search engine
library. So less style points for this solution...
I was just saying :) It's debatable exactly where in the client-
server spectrum synonym expansion belongs... and it really depends on
the needs of the project. Nothing wrong with a client doing some user
input massaging before a query hits the search server.
However, there is also no reason (and I have this on my copious-
free-time-TOOD-list) that JRuby couldn't be used behind the scenes
of a Solr analyzer/tokenizer/filter or even request handler... and
do all the cool Ruby stuff you like right there. Heck, you could
even send the Ruby code over to Solr to execute there if you like ;)
that sounds sexy ;)
Should be fairly trivial to wire JRuby in. The DataImportHandler
already has scripting language support for data transformation:
<http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
> (shield your eyes from the XML wrapping it!), so I believe JRuby
should already work in that context. This is sort of like the Mapper
stuff I built into solr-ruby, transforming data from domain to search
engine "documents".
Here's what I would do *if* I experienced severe problems with
Ferret in any of my projects:
Take aaf, replace Ferret with Lucene or even make it modular to
decide at run time which one to use, run the DRb server (or the
whole app, that depends) under JRuby and call it acts_as_lucene :-)
Et voila - great Rails integration plus Lucene's maturity. But as
long as Ferret's working fine for me that's really unlikely to
happen... Unless somebody wants to sponsor that project, of
course ;)
Just using Solr and fixing up acts_as_solr to meet your needs (if
it doesn't) would be even easier than all that :) Solr really is a
better starting point than Lucene directly, for caching,
scalability, replication, faceting, etc.
Depends on whether you need these features or not. From my
experience, lots of projects don't need these things anyway, because
they're running on a single host and nearly every other part of the
application is slower than search... Maybe it's because I'm quite
involved with the topic and am familiar with lucene's API, but to me
Solr looks like an additional layer of abstraction and complexity
which I only want to have when it really gives me a feature I need.
Plus the last time I checked Lucene didn't need xml configuration
files ;)
I hear ya about the XML config files. And always to be fair to Solr
here, you really only need to set things up from a basic example
configuration that covers most scenarios already - so it really isn't
necessary to even touch XML config except for tweaking little things.
But Solr's advantages over just Lucene are built out of experiences
that most Lucene projects eventually build anyway. Caching - really
important for faceting, which is a need that every project I touch
these days needs. Replication - really really important for
scalability of massive querying load. It's really not such a big
chunk over Lucene to bite off... and in almost all respects it is even
simpler to use Solr than Lucene anyway.
In development environments and especially when it comes to
automated tests / CI it's also quite comfortable not having to run a
separate server but using the short cut directly to the index, which
isn't possible with Solr.
Not true. Solr can work embedded. There is a base SolrServer
abstraction, with an implementation that runs embedded (inside the
same JVM) versus over HTTP. Exactly the same interface for both
operations, using a very simple API (SolrJ, much like Lucene's basic
API actually).
I'd be curious to see scalability comparisons between Ferret and
Solr - or perhaps more properly between Stellr and Solr - as it
boils down to number of documents, queries per second, and faceting
and highlighting speed. I'm betting on Solr myself (by being so
into it and basing my professional life on it).
This would be interesting, but I wouldn't be that disappointed with
Stellr ending up second given the little amount of time I've spent
building it so far. Just out of curiosity, do you have some kind of
performance testing suite for Solr which I could throw at Stellr?
No, I don't have those kinds of tests myself. While I can speak to
Solr's performance based on what I hear from our clients and the
reports in the mailing lists, I don't consider myself a performance
savvy person myself.
I'm curious - what are the numbers of documents being put into Ferret
indexes out there? millions? hundreds of millions? billions? And
are folks doing faceting? Does Ferret have faceting support?
Erik
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk