On Monday March 19 2007 8:39 pm, Jack L wrote: > This is very interesting. Because I'm planning on deploying > a solr-based search functionality soon, and I'd rather use Python,
If you're looking for something to deploy next week, Grassyknoll's not it. ;) As mentioned, it's early alpha. That said, I have the full support of my employer for this and we're going to be re-deploying our production site on it, so it's going to get done, soon. > I wonder if you have any numbers comparing the performance/CPU load > /memory footprint, etc. between Grasyknoll and solr? Sorry, I don't have anything like that ATM. One of the other devs was going to be doing some benchmarking over the weekend, but he's not back from vacation. Currently, I'm using wsgiref as the server, which is single threaded. This makes developing much easier, but isn't going to give very good performance. The nice thing about wsgi is that it's relatively easy to swap servers. However, interfacing a multi-threaded webserver with PyLucene is non-trivial, as this ML will attest. ;) I've got a really good idea of how to go about this, taking full advantage of PyLucene's GIL-releasing benefits, but that's going to have to wait until the internals get nailed down a bit more. > - Grasyknoll search vs lucene search Grassyknoll's built on PyLucene. Supposedly, PyLucene is about 2x as fast a Java lucene. Andii's got numbers on the website, IIRC. > - Grasyknoll web server vs jetty I've never used jetty. We'll probably end up using pasteserver, though flup's a possiblity as well. I don't have performance numbers on either, but I suspect PyLucene will be the bottleneck. > solr also has a Python output format. Any chance Grasyknoll can > provide the same format to make it easy to port the front-end > application? And/or a similar REST URL scheme? To be honest, I've never used Solr & my eyes tend to glaze over reading the docs. If you dig up the relevant links, I'll take a look. ;) I'm eager to make this easy to use for folks, so supporting formerly-Solr clients certainly seems reasonable. I'm planning on supporting quite a range of output formats, including (but not limited to) JSON, XML, pickle and some form of HTML for debugging/browsing. As for the REST URL scheme, it's pretty standard: GET http://foo.com/?q=find+me+things GET http://foo.com/my_doc_id/ PUT http://foo.com/my_doc_id/ DELETE http://foo.com/my_doc_id/ POST http://foo.com/ which'll create a unique id for you using uuid. We're also going to support *Many versions of the above, which would allow you to batch a bunch of operations into a single request. This is for performance reasons on the Lucene side. IIRC, the Solr python output format is intended to be eval()'d. From my perspective, that's a little dubious from a security perspective (though the same applies to pickle, I suppose). -- Peter Fein || 773-575-0694 || [EMAIL PROTECTED] http://www.pobox.com/~pfein/ || PGP: 0xCCF6AE6B irc: [EMAIL PROTECTED] || jabber: [EMAIL PROTECTED] _______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
