----- Forwarded message from Brian Atkins <[EMAIL PROTECTED]> -----

From: Brian Atkins <[EMAIL PROTECTED]>
Date: Thu, 27 Jan 2005 12:13:30 -0600
To: transhumantech <[EMAIL PROTECTED]>
Subject: [>Htech] Using google for automatic meaning extraction
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
Reply-To: [EMAIL PROTECTED]


http://www.newscientist.com/channel/info-tech/mg18524846.100
http://www.arxiv.org/abs/cs.CL/0412098

Automatic Meaning Discovery Using Google
Authors: Rudi Cilibrasi (CWI), Paul M. B. Vitanyi (CWI, University of 
Amsterdam, National ICT of Australia)
Comments: 29 pages, 10 figures
Subj-class: Computation and Language; Artificial Intelligence; 
Databases; Information Retrieval; Learning
ACM-class: I.2.4; I.2.7

     We propose a new method to extract semantic knowledge from the 
world-wide-web for both supervised and unsupervised learning using the 
Google search engine in an unconventional manner. The approach is novel 
in its unrestricted problem domain, simplicity of implementation, and 
manifestly ontological underpinnings. We give evidence of elementary 
learning of the semantics of concepts, in contrast to most prior 
approaches. The method works as follows: The world-wide-web is the 
largest database on earth, and it induces a probability mass function, 
the Google distribution, via page counts for combinations of search 
queries. This distribution allows us to tap the latent semantic 
knowledge on the web. Shannon's coding theorem is used to establish a 
code-length associated with each search query. Viewing this mapping as a 
data compressor, we connect to earlier work on Normalized Compression 
Distance. We give applications in (i) unsupervised hierarchical 
clustering, demonstrating the ability to distinguish between colors and 
numbers, and to distinguish between 17th century Dutch painters; (ii) 
supervised concept-learning by example, using Support Vector Machines, 
demonstrating the ability to understand electrical terms, religious 
terms, emergency incidents, and by conducting a massive experiment in 
understanding WordNet categories; and (iii) matching of meaning, in an 
example of automatic English-Spanish translation.
-- 
Brian Atkins
Singularity Institute for Artificial Intelligence
http://www.singinst.org/


------------------------ Yahoo! Groups Sponsor --------------------~--> 
Has someone you know been affected by illness or disease?
Network for Good is THE place to support health awareness efforts!
http://us.click.yahoo.com/RzSHvD/UOnJAA/79vVAA/PMYolB/TM
--------------------------------------------------------------------~-> 

-----BEGIN TRANSHUMANTECH SIGNATURE-----
Post message: [EMAIL PROTECTED]
Subscribe:    [EMAIL PROTECTED]
Unsubscribe:  [EMAIL PROTECTED]
List owner:   [EMAIL PROTECTED]
List home:    http://www.yahoogroups.com/group/transhumantech/
-----END TRANSHUMANTECH SIGNATURE----- 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/transhumantech/

<*> To unsubscribe from this group, send an email to:
    [EMAIL PROTECTED]

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



----- End forwarded message -----
-- 
Eugen* Leitl <a href="http://leitl.org";>leitl</a>
______________________________________________________________
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net

-------
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Attachment: pgpoMdRv7eWxA.pgp
Description: PGP signature

Reply via email to