Giovanni Tummarello wrote:
With respect to crawling and "scraping" or "sponging" or .. "trying to
guess" based on partial fragments of structured information i can say
3 thngs

a) No, we're not doing it at the moment, we are only covering those
who chose to put structured semantics. Some book stuff shows up in
Sig.ma .. e.g. http://sig.ma/search?q=frank+van+harmelen&sources=100
bookfinder, our jerome digital library installation, but the triplees
they provide are scarce and dont contribute much.  It would take so
little for this to improve on their side i believe.

b) No, we are not religious about this. We have talked about it
several times, it might make sense to try to understand as much as the
web as possible and index it. Maybe we'll do it in the future for
selected fractions of the web to show how it looks

c) crawling should be just one mean of acquiring the semantic web. in
case of bestbuy or other large retailers where prices change possibly
everyday crawling as a mean to emulate a simple.. call to a web
service seems really not the smart thing to do. Will data providers
really support with data dumps?

cheers
Giovanni
Juan,

I am hoping that the response above clarifies matters, esp. point C.

Crawling the old way is futile when the "change sensitivity" aspect of a given unit of data is high.

Georgi: even the count of German book authors, the prices of their books, across a plethora or retailers, with a wide range of prices and availability, is very sensitive to change.

Georgi/Juan:

Mechanically, there is crawling, but essentially it simply isn't the old style approach (data warehousing) of yore as exemplified by Google, Yahoo!, ASK, and others.

Kingsley

On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfeder...@gmail.com> wrote:
But Sindice could at least crawl Amazon.
It would be great to use sig.ma to create a "meshup" with the amazon data.


Juan Sequeda, Ph.D Student
Dept. of Computer Sciences
The University of Texas at Austin
www.juansequeda.com
www.semanticwebaustin.org


On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW)
<h...@ebusiness-unibw.org> wrote:
I don't think so, because this would require that Sindice crawled the
whole regular web and checked the Spongers for each URL (sic!).

Juan Sequeda wrote:

Does Sindice crawl this (or any other semantic web search engines)?
Juan Sequeda, Ph.D Student
Dept. of Computer Sciences
The University of Texas at Austin
www.juansequeda.com
www.semanticwebaustin.org


On Sat, Oct 17, 2009 at 4:24 AM, Martin Hepp (UniBW) <
h...@ebusiness-unibw.org> wrote:



Dear all:

I just found out that the Virtuoso Sponger technology is even more
powerful than I thought.

Briefly: "Spongers" create rich GoodRelations (and other RDF) meta-data
for existing Web pages on-the-fly. Other than traditional
screen-scraping approaches, Spongers reuse public APIs and other
techniques, so the data is of unprecedented degree of structure.

Now, this can be directly used in arbitrary queries... by simply using
the URI of the *existing* HTML Web page in the FROM clause of a SPARQL
query.

Example:




http://www.amazon.com/Semantic-Web-Real-World-Applications-Industry/dp/0387485309

is a Web page in plain HTML offering a book. Amazon does not yet produce
GoodRelations meta-data on their pages.

If you go to

   http://uriburner.com/sparql

and paste the URI in the "Default Graph URI " field and select "Retrieve
remote RDF for all missing source graphs", then a query like

  "SELECT * WHERE {?s ?p ?o} LIMIT 50"

returns a fully-fledged GoodRelations description for that page - as if
Amazon was already supporting GoodRelations for each of its > 4 million
items!

There are spongers for BestBuy, eBay, Zillow, and many other types of
resources.

Wow!

Congrats to Kingsley and his team!

Best wishes

Martin Hepp

--
--------------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  h...@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
        http://www.heppnetz.de/ (personal)
skype:   mfhepp
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================

Webcast:
http://www.heppnetz.de/projects/goodrelations/webcast/

Recipe for Yahoo SearchMonkey:
http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey

Talk at the Semantic Technology Conference 2009:
"Semantic Web-based E-Commerce: The GoodRelations Ontology"


http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287

Overview article on Semantic Universe:


http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html

Project page:
http://purl.org/goodrelations/

Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations

Tutorial materials:
CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey


http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709








--
--------------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  h...@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================

Webcast:
http://www.heppnetz.de/projects/goodrelations/webcast/

Recipe for Yahoo SearchMonkey:
http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey

Talk at the Semantic Technology Conference 2009:
"Semantic Web-based E-Commerce: The GoodRelations Ontology"

http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287

Overview article on Semantic Universe:

http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html

Project page:
http://purl.org/goodrelations/

Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations

Tutorial materials:
CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey

http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709





--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software Web: http://www.openlinksw.com





Reply via email to