Thanks,
Yeah our crawling is home grown -- it's just a threadsafe queue
hooked up to a threadpool + httplib2. Doing it @home style would be
pretty hot -- a related idea would be to use google gears to do a sort
of help-out-as-you-use processing. Gears will let you run threads and
do offline storage on most clients, so you might be able to get them
do some indexing in the browser as they use the application and then
contribute it back. Forcing gears on people is kind of a hurdle at
this point, but perhaps if you offer enhanced functionality or more up
to date results or something you could count on some percentage
chipping in. Longer term, I think threads and offline storage are
features you can bank on being supported all major clients out of the
box.
I think we should definitely look at grub as a near term replacement
too. At a glance, the structure is pretty appealing. It looks like
trivial client apps GET little work lists full of URLs and then POST
the results somewhere as specified in the list. I had something
working along these lines at some point, and I think it could work
well for us. One area that I can see that we could very easily chip
in to the project would be to write a python client. It could probably
be cobbled together out of what we're using right now with little
effort.
- Luke
On Jul 29, 2008, at 9:30 AM, Chris Holmes wrote:
At the creative commons salon last week the wikia search guys
mentioned that they were using and I think developing this thing
called grub - http://grub.org/ http://search.wikia.com/wiki/Grub
I have no idea how good the code is, but the idea is interesting -
distributed crawlers, like [EMAIL PROTECTED] style, to index the web. Am
wondering if it could handle rss well, do melkjug crawling. Did you
just write the crawler yourself?
Maybe not use their code, but the idea could be interesting - make a
downloadable client so users can put their cycles to melkjug,
crawling and processing new stories, etc. I could see people going
for it. I forget what it is at the moment, but I know there's at
least one open source toolkit that can help with @home style
processing. Though if we could use and contribute to the grub code
that could be nice.
C
<cholmes.vcf>
--
Archive:
http://www.openplans.org/projects/melkjug/lists/melkjug-development-list/archive/2008/07/1217341075238
To unsubscribe send an email with subject "unsubscribe" to [EMAIL PROTECTED]
Please contact [EMAIL PROTECTED] for questions.