Ciao Maurizio, I once did the step you just exposed. I remember I looked into the database for most frequent words and also looked in a personalised database of most used words (by users). I kind of matched them and came up with this bad words (or stop words) file for the italian language.
At the Comune di Prato, we decided to publish it online in the help page of our search engine: http://search.po-net.prato.it/txt/noise.txt . You can follow the link from the main help page: http://search.comune.prato.it/htm/help.htm Hope this helps. Maybe, let me know if you have some comments. Ciao ciao, -Gabriele P.S.: (in italian) se vuoi contattarmi personalmente, fallo pure! :-) Il mar, 2004-01-27 alle 12:05, Maurizio ha scritto: > Hi all, > since I couldn't find an italian bad_words file on the Net I decided to create > one on my own. In order to do that I wrote down a simple list of step to be > taken: > 1) run htdig to index my site > 2) look at db.words.db to see which words have been gathered > 3) from step 2) create a minimal bad_words file > 4) repeat step 2) and step 3) until satisfied > > I ran step 1) with success but I cannot go any further because the db file is > not an ascii file. So the question is: how can I have a list of the word > indexed by ht://Dig? > > TIA > > Maurizio > > > > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > ht://Dig general mailing list: <[EMAIL PROTECTED]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general