Ciao Maurizio,

   I once did the step you just exposed. I remember I looked into the
database for most frequent words and also looked in a personalised
database of most used words (by users). I kind of matched them and came
up with this bad words (or stop words) file for the italian language.

   At the Comune di Prato, we decided to publish it online in the help
page of our search engine: http://search.po-net.prato.it/txt/noise.txt .
You can follow the link from the main help page:
http://search.comune.prato.it/htm/help.htm

   Hope this helps. Maybe, let me know if you have some comments.

Ciao ciao,
-Gabriele

P.S.: (in italian) se vuoi contattarmi personalmente, fallo pure! :-)

Il mar, 2004-01-27 alle 12:05, Maurizio ha scritto:
> Hi all,
> since I couldn't find an italian bad_words file on the Net I decided to create 
> one on my own. In order to do that I wrote down a simple list of step to be 
> taken:
> 1) run htdig to index my site
> 2) look at db.words.db to see which words have been gathered
> 3) from step 2) create a minimal bad_words file
> 4) repeat step 2) and step 3) until satisfied
> 
> I ran step 1) with success but I cannot go any further because the db file is 
> not an ascii file. So the question is: how can I have a list of the word 
> indexed by ht://Dig?
> 
> TIA
> 
> Maurizio
> 
> 
> 
> -------------------------------------------------------
> The SF.Net email is sponsored by EclipseCon 2004
> Premiere Conference on Open Tools Development and Integration
> See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
> http://www.eclipsecon.org/osdn
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to