Re: Fusion Tables: Google's approach to sharing data on the Web

2009-07-03 Thread Kingsley Idehen

François Dongier wrote:

Kingsley,

Looks like you're imagining a scenario in which Wolfram Alpha, after 
having done its mathematical computation relevant to a particular user 
query, would expose its result in a format that would enrich the web 
of data. I agree that this would indeed be pretty nice but I wasn't 
asking for so much: I was more thinking of Alpha as an application at 
the end of the data processing pipeline (for instance, for data 
visualisation), not so much as an application that produces reusable 
output.


I really know of no application that doesn't produce some kind of output.

I also know of no kind of output that is devoid of representation :-)

In fact I have two basic questions about Wolfram|Alpha:
1. How can Alpha take advantage of the (not always "curated") data 
available on the web? This is the question I was asking, and it's not 
about data format but about data correctness: Wolfram insists that 
they must "curate" data to make sure it's reliable. I am worried that 
they won't be able to catch up, given the explosion of data that will 
soon be produced by projects such as Linked Data and Google Fusion Tables.
Of course they won't be able to catchup. I wonder if they've computed 
this reality yet.
2. Will Wolfram want to expose its curated data (ideally in RDF), 
enabling other applications (say, Sparql queries) to merge it with 
other data? Here my question really is: will they want to share this 
data, or will they prefer to keep it private? If they want to share 
it, then I agree that Linked Data format would be best .


They will share it, in due course. To their credit, they do have an API 
that is nearing release, and APIs are always the final step en route to 
Linked Data. By this I mean: APIs ultimately accelerate comprehension of 
why: Code is like FISH and Data like Wine :-)


Kingsley


Regards,
François

2009/7/3 Kingsley Idehen >


François Dongier wrote:

I wonder how Wolfram|Alpha could take advantage of all this
data made available both by Google Fusion Tables and by the
Linked Data project. Will Alpha just try to slowly integrate
it through its "curation pipeline"? Wouldn't it be better to
introduce something like "curation coefficients" that would
allow computation to be done by Alpha on imperfect data? This
would make it possible to quickly catch up on the published
data, while introducing some uncertainty in the results Alpha
returns.

Francois,

Since the overall theme is Linked Data (HTTP URIs for data
objects), how does WolframAlpha add any value if the end result is
an opaque HTML resource (one that lacks structure data granularity
or pointers to structured data sources)?

Value comes if Google exposes its Dataspace GUIDs as HTTP URIs,
and then WolframAlpha (or anyone else in the data processing
pipeline) does the same, then you get something that is truly
valuable i.e.:

1. Computation Answer Engine that emits its Linked Data (as per
Linked Data meme)
2. Google's contribution to the Linked Data Web realm via Data
Spaces / Virtual Database technology that also emits Linked Data.

The ultimate value of the Web remains the fundamental separation
of the following re. data:

1. Identity
2. Storage
3. Access
4. Representation
5. Presentation.

We cannot see, comprehend, and appreciate the Web via item #5
solely, which is always the case when the output representation
from a Web service lacks pointers (HTTP URIs)  to  RDF model based
structured and interlinked data  in line with Linked Data meme.

To conclude, things will more than likely get better now that
 Google, Yahoo!, and Microsoft (naturally) are beginning to see
alignment between their respective customer-driven technology
adoption strategies and the virtues of Linked Data, thanks to RDFa
and the GoodRelations vocabulary.


Kingsley


Cheers,
François


On Fri, Jul 3, 2009 at 2:28 PM, Chris Bizer mailto:ch...@bizer.de> >> wrote:

   
   Hi all,


   
   I’m regularly following Alon Halevy blog as I really like his

   thoughts on dataspaces [1].

   
   Today, I discovered this post about Google Fusion Tables


   
 
 http://alonhalevy.blogspot.com/2009/06/fusion-tables-third-piece-of-puzzle.html


   
   “The main goal of Fusion Tables is to make it easier for

people to
   create, manage and share on structured data on the Web. Fusion
   Tables is a new kind of data management system that focuses on
   features that /enable collaboration/. […] In a nutshell, Fusion
   Tables enables you to upload tabular data (up to 100MB per
table)
   from spreads

Re: Fusion Tables: Google's approach to sharing data on the Web

2009-07-03 Thread François Dongier
Kingsley,

Looks like you're imagining a scenario in which Wolfram Alpha, after having
done its mathematical computation relevant to a particular user query, would
expose its result in a format that would enrich the web of data. I agree
that this would indeed be pretty nice but I wasn't asking for so much: I was
more thinking of Alpha as an application at the end of the data processing
pipeline (for instance, for data visualisation), not so much as an
application that produces reusable output.
In fact I have two basic questions about Wolfram|Alpha:
1. How can Alpha take advantage of the (not always "curated") data available
on the web? This is the question I was asking, and it's not about data
format but about data correctness: Wolfram insists that they must "curate"
data to make sure it's reliable. I am worried that they won't be able to
catch up, given the explosion of data that will soon be produced by projects
such as Linked Data and Google Fusion Tables.
2. Will Wolfram want to expose its curated data (ideally in RDF), enabling
other applications (say, Sparql queries) to merge it with other data? Here
my question really is: will they want to share this data, or will they
prefer to keep it private? If they want to share it, then I agree that
Linked Data format would be best .

Regards,
François

2009/7/3 Kingsley Idehen 

> François Dongier wrote:
>
>> I wonder how Wolfram|Alpha could take advantage of all this data made
>> available both by Google Fusion Tables and by the Linked Data project. Will
>> Alpha just try to slowly integrate it through its "curation pipeline"?
>> Wouldn't it be better to introduce something like "curation coefficients"
>> that would allow computation to be done by Alpha on imperfect data? This
>> would make it possible to quickly catch up on the published data, while
>> introducing some uncertainty in the results Alpha returns.
>>
> Francois,
>
> Since the overall theme is Linked Data (HTTP URIs for data objects), how
> does WolframAlpha add any value if the end result is an opaque HTML resource
> (one that lacks structure data granularity or pointers to structured data
> sources)?
>
> Value comes if Google exposes its Dataspace GUIDs as HTTP URIs, and then
> WolframAlpha (or anyone else in the data processing pipeline) does the same,
> then you get something that is truly valuable i.e.:
>
> 1. Computation Answer Engine that emits its Linked Data (as per Linked Data
> meme)
> 2. Google's contribution to the Linked Data Web realm via Data Spaces /
> Virtual Database technology that also emits Linked Data.
>
> The ultimate value of the Web remains the fundamental separation of the
> following re. data:
>
> 1. Identity
> 2. Storage
> 3. Access
> 4. Representation
> 5. Presentation.
>
> We cannot see, comprehend, and appreciate the Web via item #5 solely, which
> is always the case when the output representation from a Web service lacks
> pointers (HTTP URIs)  to  RDF model based structured and interlinked data
>  in line with Linked Data meme.
>
> To conclude, things will more than likely get better now that  Google,
> Yahoo!, and Microsoft (naturally) are beginning to see alignment between
> their respective customer-driven technology adoption strategies and the
> virtues of Linked Data, thanks to RDFa and the GoodRelations vocabulary.
>
>
> Kingsley
>
>>
>> Cheers,
>> François
>>
>>
>> On Fri, Jul 3, 2009 at 2:28 PM, Chris Bizer > ch...@bizer.de>> wrote:
>>
>>
>>Hi all,
>>
>>
>>I’m regularly following Alon Halevy blog as I really like his
>>thoughts on dataspaces [1].
>>
>>
>>Today, I discovered this post about Google Fusion Tables
>>
>>
>>
>> http://alonhalevy.blogspot.com/2009/06/fusion-tables-third-piece-of-puzzle.html
>>
>>
>>“The main goal of Fusion Tables is to make it easier for people to
>>create, manage and share on structured data on the Web. Fusion
>>Tables is a new kind of data management system that focuses on
>>features that /enable collaboration/. […] In a nutshell, Fusion
>>Tables enables you to upload tabular data (up to 100MB per table)
>>from spreadsheets and CSV files. You can filter and aggregate the
>>data and visualize it in several ways, such as maps and time
>>lines. The system will try to recognize columns that represent
>>geographical locations and suggest appropriate visualizations. To
>>collaborate, you can share a table with a select set of
>>collaborators or make it public. One of the reasons to collaborate
>>is to enable /fusing/ data from multiple tables, which is a simple
>>yet powerful form of data integration. If you have a table about
>>water resources in the countries of the world, and I have data
>>about the incidence of malaria in various countries, we can fuse
>>our data on the country column, and see our data side by side.”
>>
>>
>>See also
>>
>>
>>Google announcement
>>http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html
>>
>> 

Re: Fusion Tables: Google's approach to sharing data on the Web

2009-07-03 Thread Kingsley Idehen

François Dongier wrote:
I wonder how Wolfram|Alpha could take advantage of all this data made 
available both by Google Fusion Tables and by the Linked Data project. 
Will Alpha just try to slowly integrate it through its "curation 
pipeline"? Wouldn't it be better to introduce something like "curation 
coefficients" that would allow computation to be done by Alpha on 
imperfect data? This would make it possible to quickly catch up on the 
published data, while introducing some uncertainty in the results 
Alpha returns.

Francois,

Since the overall theme is Linked Data (HTTP URIs for data objects), how 
does WolframAlpha add any value if the end result is an opaque HTML 
resource (one that lacks structure data granularity or pointers to 
structured data sources)?


Value comes if Google exposes its Dataspace GUIDs as HTTP URIs, and then 
WolframAlpha (or anyone else in the data processing pipeline) does the 
same, then you get something that is truly valuable i.e.:


1. Computation Answer Engine that emits its Linked Data (as per Linked 
Data meme)
2. Google's contribution to the Linked Data Web realm via Data Spaces / 
Virtual Database technology that also emits Linked Data.


The ultimate value of the Web remains the fundamental separation of the 
following re. data:


1. Identity
2. Storage
3. Access
4. Representation
5. Presentation.

We cannot see, comprehend, and appreciate the Web via item #5 solely, 
which is always the case when the output representation from a Web 
service lacks pointers (HTTP URIs)  to  RDF model based structured and 
interlinked data  in line with Linked Data meme.


To conclude, things will more than likely get better now that  Google, 
Yahoo!, and Microsoft (naturally) are beginning to see alignment between 
their respective customer-driven technology adoption strategies and the 
virtues of Linked Data, thanks to RDFa and the GoodRelations vocabulary.



Kingsley 


Cheers,
François

On Fri, Jul 3, 2009 at 2:28 PM, Chris Bizer > wrote:


 


Hi all,

 


I’m regularly following Alon Halevy blog as I really like his
thoughts on dataspaces [1].

 


Today, I discovered this post about Google Fusion Tables

 



http://alonhalevy.blogspot.com/2009/06/fusion-tables-third-piece-of-puzzle.html

 


“The main goal of Fusion Tables is to make it easier for people to
create, manage and share on structured data on the Web. Fusion
Tables is a new kind of data management system that focuses on
features that /enable collaboration/. […] In a nutshell, Fusion
Tables enables you to upload tabular data (up to 100MB per table)
from spreadsheets and CSV files. You can filter and aggregate the
data and visualize it in several ways, such as maps and time
lines. The system will try to recognize columns that represent
geographical locations and suggest appropriate visualizations. To
collaborate, you can share a table with a select set of
collaborators or make it public. One of the reasons to collaborate
is to enable /fusing/ data from multiple tables, which is a simple
yet powerful form of data integration. If you have a table about
water resources in the countries of the world, and I have data
about the incidence of malaria in various countries, we can fuse
our data on the country column, and see our data side by side.”

 


See also

 


Google announcement
http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html

Water data example

http://www.circleofblue.org/waternews/2009/world/google-brings-water-data-to-life/

 


Taken this together with Google Squared and the recent
announcement that Google is going to crawl microformats and RDFa,

it starts to look like the folks at Google are working in the same
direction as the Linking Open Data community, but as usual a bit
more centralized and less webish.

 


Cheers,

 


Chris

 

 


[1] http://www.cs.berkeley.edu/~franklin/Papers/dataspaceSR.pdf


 


--

Prof. Dr. Christian Bizer

Web-based Systems Group

Freie Universität Berlin

+49 30 838 55509

http://www.bizer.de

ch...@bizer.de 

 






--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com








Re: Fusion Tables: Google's approach to sharing data on the Web

2009-07-03 Thread François Dongier
I wonder how Wolfram|Alpha could take advantage of all this data made
available both by Google Fusion Tables and by the Linked Data project. Will
Alpha just try to slowly integrate it through its "curation pipeline"?
Wouldn't it be better to introduce something like "curation coefficients"
that would allow computation to be done by Alpha on imperfect data? This
would make it possible to quickly catch up on the published data, while
introducing some uncertainty in the results Alpha returns.
Cheers,
François

On Fri, Jul 3, 2009 at 2:28 PM, Chris Bizer  wrote:

>
>
> Hi all,
>
>
>
> I’m regularly following Alon Halevy blog as I really like his thoughts on
> dataspaces [1].
>
>
>
> Today, I discovered this post about Google Fusion Tables
>
>
>
>
> http://alonhalevy.blogspot.com/2009/06/fusion-tables-third-piece-of-puzzle.html
>
>
>
> “The main goal of Fusion Tables is to make it easier for people to create,
> manage and share on structured data on the Web. Fusion Tables is a new kind
> of data management system that focuses on features that *enable
> collaboration*. […] In a nutshell, Fusion Tables enables you to upload
> tabular data (up to 100MB per table) from spreadsheets and CSV files. You
> can filter and aggregate the data and visualize it in several ways, such as
> maps and time lines. The system will try to recognize columns that represent
> geographical locations and suggest appropriate visualizations. To
> collaborate, you can share a table with a select set of collaborators or
> make it public. One of the reasons to collaborate is to enable *fusing*data 
> from multiple tables, which is a simple yet powerful form of data
> integration. If you have a table about water resources in the countries of
> the world, and I have data about the incidence of malaria in various
> countries, we can fuse our data on the country column, and see our data side
> by side.”
>
>
>
> See also
>
>
>
> Google announcement
> http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html
>
> Water data example
> http://www.circleofblue.org/waternews/2009/world/google-brings-water-data-to-life/
>
>
>
> Taken this together with Google Squared and the recent announcement that
> Google is going to crawl microformats and RDFa,
>
> it starts to look like the folks at Google are working in the same
> direction as the Linking Open Data community, but as usual a bit more
> centralized and less webish.
>
>
>
> Cheers,
>
>
>
> Chris
>
>
>
>
>
> [1] http://www.cs.berkeley.edu/~franklin/Papers/dataspaceSR.pdf
>
>
>
> --
>
> Prof. Dr. Christian Bizer
>
> Web-based Systems Group
>
> Freie Universität Berlin
>
> +49 30 838 55509
>
> http://www.bizer.de
>
> ch...@bizer.de
>
>
>


Re: Fusion Tables: Google's approach to sharing data on the Web

2009-07-03 Thread Kingsley Idehen

Sören Auer wrote:

Chris Bizer wrote:
I’m regularly following Alon Halevy blog as I really like his 
thoughts on dataspaces [1].


I've the impression that's pretty much what DabbleDB [1] and others 
already do for ages even better than Google. Or am I wrong?


--Sören

[1] http://dabbledb.com/



Soren,

Think of DabbleDB as a webby EAV/CR model equivalent of Microsoft Access.

If they decide to imbibe the Linked Data meme, we would end up with 
something truly exciting for high level interaction with the Linked Data 
Web.


btw - DabbleDB is written in SmallTalk, the inspiration for 
Objective-C', that also provide inspiration for HTTP.


All we want to do is put stuff in spaces that endow each data object 
with HTTP based Identifiers in line with the Linked Data meme, once this 
is done, data fusion (or data meshing) becomes incidental and implicit.


--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com








Re: .htaccess a major bottleneck to Semantic Web adoption / Was: Re: RDFa vs RDF/XML and content negotiation

2009-07-03 Thread Danny Ayers
2009/7/2 Bill Roberts :
> I thought I'd give the .htaccess approach a try, to see what's involved in
> actually setting it up.  I'm no expert on Apache, but I know the basics of
> how it works, I've got full access to a web server and I can read the online
> Apache documentation as well as the next person.

I've tried similar, even stuff using PURLs - incredibly difficult to
get right. (My downtime overrides all, so I'm not even sure if I got
it right in the end)

I really think we need a (copy & paste) cheat sheet.

Volunteers?

Cheers,
Danny.


-- 
http://danny.ayers.name



Re: Fusion Tables: Google's approach to sharing data on the Web

2009-07-03 Thread Sören Auer

Chris Bizer wrote:
I’m regularly following Alon Halevy blog as I really like his thoughts 
on dataspaces [1].


I've the impression that's pretty much what DabbleDB [1] and others 
already do for ages even better than Google. Or am I wrong?


--Sören

[1] http://dabbledb.com/



Re: Fusion Tables: Google's approach to sharing data on the Web

2009-07-03 Thread Kingsley Idehen

Chris Bizer wrote:


Hi all,

I’m regularly following Alon Halevy blog as I really like his thoughts 
on dataspaces [1].


Today, I discovered this post about Google Fusion Tables

http://alonhalevy.blogspot.com/2009/06/fusion-tables-third-piece-of-puzzle.html

“The main goal of Fusion Tables is to make it easier for people to 
create, manage and share on structured data on the Web. Fusion Tables 
is a new kind of data management system that focuses on features that 
/enable collaboration/. […] In a nutshell, Fusion Tables enables you 
to upload tabular data (up to 100MB per table) from spreadsheets and 
CSV files. You can filter and aggregate the data and visualize it in 
several ways, such as maps and time lines. The system will try to 
recognize columns that represent geographical locations and suggest 
appropriate visualizations. To collaborate, you can share a table with 
a select set of collaborators or make it public. One of the reasons to 
collaborate is to enable /fusing/ data from multiple tables, which is 
a simple yet powerful form of data integration. If you have a table 
about water resources in the countries of the world, and I have data 
about the incidence of malaria in various countries, we can fuse our 
data on the country column, and see our data side by side.”


See also

Google announcement 
http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html


Water data example 
http://www.circleofblue.org/waternews/2009/world/google-brings-water-data-to-life/


Taken this together with Google Squared and the recent announcement 
that Google is going to crawl microformats and RDFa,


it starts to look like the folks at Google are working in the same 
direction as the Linking Open Data community, but as usual a bit more 
centralized and less webish.


Cheers,

Chris

[1] http://www.cs.berkeley.edu/~franklin/Papers/dataspaceSR.pdf

--

Prof. Dr. Christian Bizer

Web-based Systems Group

Freie Universität Berlin

+49 30 838 55509

http://www.bizer.de

ch...@bizer.de 


Chris,

A few questions:

1. What's the difference between a Dataspace and a Data Space?
2. What's the difference between either of the above and a Virtual 
Database (plaform for: Data Virtualization)?



I ask these questions because in your view it's crystal clear to me that 
there must be differences, so please fill in the blanks for me as I 
profoundly believe the quest for knowledge always starts at: knowing 
what you don't know. Right now, there is clearly something I don't know 
about Data Spaces, Dataspaces, and Virtual Databases.




--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com








Fusion Tables: Google's approach to sharing data on the Web

2009-07-03 Thread Chris Bizer
 

Hi all,

 

I’m regularly following Alon Halevy blog as I really like his thoughts on
dataspaces [1].

 

Today, I discovered this post about Google Fusion Tables

 

 

http://alonhalevy.blogspot.com/2009/06/fusion-tables-third-piece-of-puzzle.h
tml

 

“The main goal of Fusion Tables is to make it easier for people to create,
manage and share on structured data on the Web. Fusion Tables is a new kind
of data management system that focuses on features that enable
collaboration. […] In a nutshell, Fusion Tables enables you to upload
tabular data (up to 100MB per table) from spreadsheets and CSV files. You
can filter and aggregate the data and visualize it in several ways, such as
maps and time lines. The system will try to recognize columns that represent
geographical locations and suggest appropriate visualizations. To
collaborate, you can share a table with a select set of collaborators or
make it public. One of the reasons to collaborate is to enable fusing data
from multiple tables, which is a simple yet powerful form of data
integration. If you have a table about water resources in the countries of
the world, and I have data about the incidence of malaria in various
countries, we can fuse our data on the country column, and see our data side
by side.”

 

See also

 

Google announcement

http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html

Water data example

http://www.circleofblue.org/waternews/2009/world/google-brings-water-data-to
-life/

 

Taken this together with Google Squared and the recent announcement that
Google is going to crawl microformats and RDFa,

it starts to look like the folks at Google are working in the same direction
as the Linking Open Data community, but as usual a bit more centralized and
less webish.

 

Cheers,

 

Chris

 

 

[1] http://www.cs.berkeley.edu/~franklin/Papers/dataspaceSR.pdf

 

--

Prof. Dr. Christian Bizer

Web-based Systems Group

Freie Universität Berlin

+49 30 838 55509

  http://www.bizer.de

  ch...@bizer.de

 



Re: DBpedia 3.3 - different versions of Geo data description

2009-07-03 Thread Kingsley Idehen

bluma...@punkt.at wrote:
Dear, 

there are now three different versions/properties to describe geo locations: 

1. http://dbpedia.org/page/Leipzig uses 

dbpprop:latDeg, latMin, latSec etc. 
	

2. http://dbpedia.org/page/Berlin uses (good old)

geo:lat, geo:long   
 
3. http://dbpedia.org/page/Paris uses
dbpprop:latLong redirect to dbpedia :Paris/latLong/coord 


Our PoolParty [1] application made use of version 2.

How will the Linked Data community handle this kind of problems in the future?
If changes in already widely used schemata are made, this causes several 
problems.

Best wishes,
Andreas

[1] http://poolparty.punkt.at/


  

Andreas,

The solution has always been to make a set of purpose specific named 
rules in Virtuoso using owl:subproperty. Once the rules are loaded, you 
simply use a pragma with your SPARQL queries which applies these rules.


We should have a standard set of these mapping rules loaded as part of 
DBpedia in general.


Georgi: have you done anything re. the above based on the DBpedia ontology?



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com








[ANN] DBpedia 3.3

2009-07-03 Thread Georgi Kobilarov
Dear all,

we are pleased to announce the release of DBpedia 3.3. This release is
based on Wikipedia dumps of May 2009.

The new release includes the following improvements over DBpedia 3.2:

1. more accurate abstract extraction
2. labels and abstracts in 80 languages (see [1])
3. several infobox extraction bugfixes
4. new links to Dailymed, Diseasome, Drugbank, Sider, TCM 
5. updated Open Cyc links

You can find the datasets at [2], and the rdf files at [1]. The dataset
is available to be queried at our Sparql endpoint [3].

After eight long months without DBpedia release (due to a lack of
Wikipedia dumps), today's release will bring us up to speed again, and
we will release DBpedia datasets much more often in the future.


On behalf of the DBpedia Team of Freie Universität Berlin, Universität
Leipzig and Openlink Software,

Georgi

[1] http://downloads.dbpedia.org/3.3/
[2] http://wiki.dbpedia.org/Downloads
[3] http://dbpedia.org/sparql/

--
Georgi Kobilarov
Freie Universität Berlin
www.georgikobilarov.com





DBpedia 3.3 - different versions of Geo data description

2009-07-03 Thread blumauer
Dear, 

there are now three different versions/properties to describe geo locations: 

1. http://dbpedia.org/page/Leipzig uses 

dbpprop:latDeg, latMin, latSec etc. 

2. http://dbpedia.org/page/Berlin uses (good old)

geo:lat, geo:long   
 
3. http://dbpedia.org/page/Paris uses
dbpprop:latLong redirect to dbpedia :Paris/latLong/coord 

Our PoolParty [1] application made use of version 2.

How will the Linked Data community handle this kind of problems in the future?
If changes in already widely used schemata are made, this causes several 
problems.

Best wishes,
Andreas

[1] http://poolparty.punkt.at/