Re: [Ferret-talk] Road map of ferret

Erik Hatcher Thu, 28 Aug 2008 13:15:10 -0700


On Aug 28, 2008, at 3:02 PM, Jens Kraemer wrote:

Gotcha. Meaning the search server is pulling from the DBdirectly. That's what the DataImportHandler in Solr does as well.It'd be a simple single HTTP request to Solr (once the DB stuff isconfigured, of course) to have it do full or incremental DB indexing.
With the slight difference that custom model logic defined in therails model class is still involved to preprocess data, index valuescalculated at indexing time or even have certain records refusebeing indexed based on their current state. Having per documentboosts depending on some value from the database (i.e. recordpopularity) is also a classic... Aaf never just pulls data from thedb, it always uses rails model objects. Doesn't make indexing fasterof course...

All great points. ActiveRecord is much more pleasant than any otherdatabase access that I've ever worked with. I don't generally workwith databases personally, though. The bulk of my full-text searchingexperiences don't involve databases at all.

I suppose the Java counterpart would be Hibernate Search - surelyinvolving a lot more hideous XML and @annotations - ewww.

In development environments and especially when it comes toautomated tests / CI it's also quite comfortable not having to runa separate server but using the short cut directly to the index,which isn't possible with Solr.
Not true. Solr can work embedded. There is a base SolrServerabstraction, with an implementation that runs embedded (inside thesame JVM) versus over HTTP. Exactly the same interface for bothoperations, using a very simple API (SolrJ, much like Lucene'sbasic API actually).
cool, but that won't work for Rails projects running on MRI andaccessing solr via solr-ruby.


Fair point.

Again, the answer comes back to JRuby ;) Forget MRI. Good pointabout solr-ruby - it is specifically designed for Solr over HTTP. Itwouldn't take much to refactor it to work with embedded Solr via JRubythough. But if JRuby is a given, it'd be just as easy to work withSolrJ's API directly.

Though for testing purposes, solr-ruby is easily mocked. solr-rubytouts great (98% or something like that) code coverage with unittests, many of those tests are against solr-ruby's API with Solritself mocked. And there are tests that fire up Solr in thebackground and test that way too for full functional tests. So forunit testing purposes, having Solr running isn't needed, but itlaunches plenty fast enough for testing end-to-end if desired.

I'm curious - what are the numbers of documents being put intoFerret indexes out there? millions? hundreds of millions?billions? And are folks doing faceting? Does Ferret have facetingsupport?
not sure about the billions, but afair an earlier message in thisthread stated an index size of 90 million documents with aaf.Altlaw.org has reported an index size of > 4GB with around 700kdocuments last fall. The selfhtml.org index has approximately 1million forum entries indexed, index size around 2GB. Stellr doesn'tever use more than around 50MB of RAM during indexing and searchingthis index. I know RAM is cheap and all, but RAM size still has aquite large influence on the price of the server you rent for yourapp, at least here in germany.


90 million is impressive for sure.

RAM - well, when Ferret/Stellr does faceting we'll revisit thatdiscussion :) Solr loves RAM! It still can run in modestenvironments, but the more RAM you can give it to use for caches(depending on your needs) the better it is.

Without doubt Solr has much more references in the area of suchlarge installations than ferret/aaf. I for myself never saw aaf as adrop-in solution for indexes of this size, but more as an easy touse out of the box solution for the average rails app with maybeseveral thousands or tens of thousands records, but I'm happy tosee it still works in larger scale setups.


Indeed!  ferret: +1 - no question!

Heck, it all began with a simple full text search for my blog ;)

Same for me (though I abandoned it when I realized that regularblogging and server maintenance weren't for me).

Regarding the faceting - it's not built into ferret, and aaf doesn'tsupport it either since I didn't need it yet, and nobody elserequested this feature so far. All in all I think the average usagescenarios of solr and aaf are quite different atm...

I'm really surprised by that. Faceting is the major feature thatattracts folks to Solr. It's critical for all of our customers.

But yeah, no question that Lucene/Solr and Ferret/Stellr can happilycoexist and aren't necessarily competition for every project. Butthere definitely are those areas of overlap where a project could gowith either solution. And I would definitely not try to shoehorn Solrinto a project where it didn't fit and Ferret worked fine. I'mpragmatic like that.

I'll try to find the time to benchmark the selfhtml.org data setwith solr and stellr. I'll report my findings here.

Awesome. If you have the data in some easily digestible format, I'dbe happy to toss it into Solr and report back numbers from mydevelopment machine. Drop me a line offline if you'd like.


        Erik

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] Road map of ferret

Reply via email to