Re: [pylucene-dev] Announcing Grasyknoll, was Re: Need to build a high-load searcher

Pete Mon, 19 Mar 2007 18:31:59 -0800

On Monday March 19 2007 8:39 pm, Jack L wrote:
> This is very interesting. Because I'm planning on deploying
> a solr-based search functionality soon, and I'd rather use Python,


If you're looking for something to deploy next week, Grassyknoll's not it. ;) 
As mentioned, it's early alpha. That said, I have the full support of my 
employer for this and we're going to be re-deploying our production site on 
it, so it's going to get done, soon.

> I wonder if you have any numbers comparing the performance/CPU load
> /memory footprint, etc. between Grasyknoll and solr?

Sorry, I don't have anything like that ATM.  One of the other devs was going 
to be doing some benchmarking over the weekend, but he's not back from 
vacation.

Currently, I'm using wsgiref as the server, which is single threaded.  This 
makes developing much easier, but isn't going to give very good performance.  
The nice thing about wsgi is that it's relatively easy to swap servers.  
However, interfacing a multi-threaded webserver with PyLucene is non-trivial, 
as this ML will attest. ;) I've got a really good idea of how to go about 
this, taking full advantage of PyLucene's GIL-releasing benefits, but that's 
going to have to wait until the internals get nailed down a bit more.

> - Grasyknoll search vs lucene search

Grassyknoll's built on PyLucene.  Supposedly, PyLucene is about 2x as fast a 
Java lucene.  Andii's got numbers on the website, IIRC.

> - Grasyknoll web server vs jetty

I've never used jetty.  We'll probably end up using pasteserver, though flup's 
a possiblity as well.  I don't have performance numbers on either, but I 
suspect PyLucene will be the bottleneck.

> solr also has a Python output format. Any chance Grasyknoll can
> provide the same format to make it easy to port the front-end
> application? And/or a similar REST URL scheme?

To be honest, I've never used Solr & my eyes tend to glaze over reading the 
docs.  If you dig up the relevant links, I'll take a look. ;) I'm eager to 
make this easy to use for folks, so supporting formerly-Solr clients 
certainly seems reasonable.   I'm planning on supporting quite a range of 
output formats, including (but not limited to) JSON, XML, pickle and some 
form of HTML for debugging/browsing.

As for the REST URL scheme, it's pretty standard:

GET http://foo.com/?q=find+me+things
GET http://foo.com/my_doc_id/
PUT http://foo.com/my_doc_id/
DELETE http://foo.com/my_doc_id/
POST http://foo.com/ which'll create a unique id for you using uuid.

We're also going to support *Many versions of the above, which would allow you 
to batch a bunch of operations into a single request.  This is for 
performance reasons on the Lucene side.

IIRC, the Solr python output format is intended to be eval()'d.  From my 
perspective, that's a little dubious from a security perspective (though the 
same applies to pickle, I suppose).

-- 
Peter Fein   ||   773-575-0694   ||   [EMAIL PROTECTED]
http://www.pobox.com/~pfein/   ||   PGP: 0xCCF6AE6B
irc: [EMAIL PROTECTED]   ||   jabber: [EMAIL PROTECTED]
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] Announcing Grasyknoll, was Re: Need to build a high-load searcher

Reply via email to