Re: Italian web sites

2002-04-29 Thread [EMAIL PROTECTED]

The first one.

Bye Laura


> What does it mean? "Italian website" can be:
>   - site that use italian language
>   - site owned by an italian organization
>   - site hosted in a italian geographical site
> Every definition has a different solution.
> 
> Date sent:Wed, 24 Apr 2002 11:02:32 +0200
> From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject:  Italian web sites
> To:   [EMAIL PROTECTED]
> Send reply to:Lucene Users List 
> 
> > Hi all,
> >
> > I'm using Jobo for spidering web sites and lucene for indexing. The
> > problem is that I'd like spidering only Italian web sites.
> > How can I see discover the country of a web site?
> >
> > Dou you know some method that tou can suggest me?
> >
> > Thanks
> >
> >
> > Laura
> >
> 
> 
> --
> Marco Ferrante ([EMAIL PROTECTED])
> CSITA (Centro Servizi Informatici e Telematici d'Ateneo)
> Università degli Studi di Genova - Italy
> Via Brigata Salerno, ponte - 16147 Genova
> tel (+39) 0103532621 (interno tel. 2621)
> --
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-
[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:lucene-user-
[EMAIL PROTECTED]>
> 
> 


Re: Italian web sites

2002-04-26 Thread Marco Ferrante

What does it mean? "Italian website" can be:
  - site that use italian language
  - site owned by an italian organization
  - site hosted in a italian geographical site
Every definition has a different solution.

Date sent:  Wed, 24 Apr 2002 11:02:32 +0200
From:   "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Subject:    Italian web sites
To: [EMAIL PROTECTED]
Send reply to:  Lucene Users List <[EMAIL PROTECTED]>

> Hi all,
>
> I'm using Jobo for spidering web sites and lucene for indexing. The
> problem is that I'd like spidering only Italian web sites.
> How can I see discover the country of a web site?
>
> Dou you know some method that tou can suggest me?
>
> Thanks
>
>
> Laura
>


--
Marco Ferrante ([EMAIL PROTECTED])
CSITA (Centro Servizi Informatici e Telematici d'Ateneo)
Università degli Studi di Genova - Italy
Via Brigata Salerno, ponte - 16147 Genova
tel (+39) 0103532621 (interno tel. 2621)
--


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: Italian web sites

2002-04-24 Thread Ype Kingma

Laura

>Hi all,
>
>I'm using Jobo for spidering web sites and lucene for indexing. The
>problem is that I'd like spidering only Italian web sites.
>How can I see discover the country of a web site?
>
>Dou you know some method that tou can suggest me?

The best method I know is using n-grams of characters and
use the frequencies of the n-grams that occur most:
http://citeseer.nj.nec.com/context/698873/68861

Regards,
Ype

-- 

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: Italian web sites

2002-04-24 Thread Karl Øie

hm... this looks very interesting! if it is a perl exe you can just copy the 
text into a temp file and run the per exe on that file and redirect the 
output to another tmp file. then read the file and use the result in a lucene 
keyword.

mvh karl øie

On Wednesday 24 April 2002 13:46, [EMAIL PROTECTED] wrote:
> Hi all,
> 
> I have found a very interesting library which is written in perl.
> The problem is now how I can use this library.
> 
> Anyway the library is Textcat an you can find it:
> 
> http://odur.let.rug.nl/~vannoord/TextCat/
> 
> Bye
> 
> Laura
> 
>
> > combined with that you could use an italian stop-
>
> word list to run statistics 
>
> > on a page :-) ?!?
> > 
> > On Wednesday 24 April 2002 11:02, [EMAIL PROTECTED] wrote:
> >
> > > Hi all,
> > > 
> > > I'm using Jobo for spidering web sites and lucene for indexing. The 
> > > problem is that I'd like spidering only Italian web sites. 
> > > How can I see discover the country of a web site?
> > > 
> > > Dou you know some method that tou can suggest me?
> > > 
> > > Thanks
> > > 
> > > 
> > > Laura
> > > 
> >
> > 
> > 
> > --
> > To unsubscribe, e-mail:   <mailto:lucene-user-
>
> [EMAIL PROTECTED]>
>
> > For additional commands, e-mail: <mailto:lucene-user-
>
> [EMAIL PROTECTED]>
>
> > 


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: Italian web sites

2002-04-24 Thread [EMAIL PROTECTED]

Hi all,

I have found a very interesting library which is written in perl.
The problem is now how I can use this library.

Anyway the library is Textcat an you can find it:

http://odur.let.rug.nl/~vannoord/TextCat/

Bye

Laura

> combined with that you could use an italian stop-
word list to run statistics 
> on a page :-) ?!?
> 
> On Wednesday 24 April 2002 11:02, [EMAIL PROTECTED] wrote:
> > Hi all,
> > 
> > I'm using Jobo for spidering web sites and lucene for indexing. The 
> > problem is that I'd like spidering only Italian web sites. 
> > How can I see discover the country of a web site?
> > 
> > Dou you know some method that tou can suggest me?
> > 
> > Thanks
> > 
> > 
> > Laura
> > 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-
[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:lucene-user-
[EMAIL PROTECTED]>
> 
> 


Re: Italian web sites

2002-04-24 Thread Karl Øie

combined with that you could use an italian stop-word list to run statistics 
on a page :-) ?!?

On Wednesday 24 April 2002 11:02, [EMAIL PROTECTED] wrote:
> Hi all,
> 
> I'm using Jobo for spidering web sites and lucene for indexing. The 
> problem is that I'd like spidering only Italian web sites. 
> How can I see discover the country of a web site?
> 
> Dou you know some method that tou can suggest me?
> 
> Thanks
> 
> 
> Laura
> 


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




RE: Italian web sites

2002-04-24 Thread Nader S. Henein

sniff the IP and then using the database at the
internet topology website http://netgeo.caida.org/perl/netgeo.cgi
you can find the country of origin, (use that to populate your
own DB) so retrieval decreases as you accumulate IPs), but that will
give you the website in Italy (not Italian websites). Unfortunately unless
Italian
uses a different encoding for the page, picking it up from the page
(JavaScript)
won't help much.




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, April 24, 2002 1:03 PM
To: [EMAIL PROTECTED]
Subject: Italian web sites


Hi all,

I'm using Jobo for spidering web sites and lucene for indexing. The
problem is that I'd like spidering only Italian web sites.
How can I see discover the country of a web site?

Dou you know some method that tou can suggest me?

Thanks


Laura



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Italian web sites

2002-04-24 Thread [EMAIL PROTECTED]

Hi all,

I'm using Jobo for spidering web sites and lucene for indexing. The 
problem is that I'd like spidering only Italian web sites. 
How can I see discover the country of a web site?

Dou you know some method that tou can suggest me?

Thanks


Laura