If you have a lot of regex expressions in your crawl-urlfilter.txt file, that's probably what's making updatedb so slow. If you're just filtering against a list of domains, I believe there's a new domain URL filter that was just added to JIRA which caches domain names and speeds things up considerably.
Andy On 9/30/05, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > > You can try to experiment with seetings in the nutch-config.xml > Open file streams, more cache for sorting things like that may help, > but also may crash the system because to many open files (under unix > this can be configured). > HTH > Stefan > > Am 30.09.2005 um 18:31 schrieb Jon Shoberg: > > > Calling UpdateDB for my segments (500K) is pretty slow as a > > relative obersvation. > > > > Aside from bigger hardware, is ther anything that can be done to > > speed up the update process? Can multiple segments update the DB > > at the same time? > > > > Any optimizations or suggested useages? > > > > thanks > > -j > > > > > > > > --------------------------------------------------------------- > company: http://www.media-style.com > forum: http://www.text-mining.org > blog: http://www.find23.net > > > > -- Andy Liu [EMAIL PROTECTED] (301) 873-8458
