FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread William Waites
Friedrich, I'm forwarding your message to one of the W3 lists.

Some of your questions could be easily answered (e.g. for euro in your
context, you don't have a predicate for that, you have an Observation
with units of a currency and you could take the currency from
dbpedia, the predicate is units).

But I think your concerns are quite valid generally and your
experience reflects that of most web site developers that encounter
RDF.

LOD list, Friedrich is a clueful developer, responsible for
http://bund.offenerhaushalt.de/ amongst other things. What can we
learn from this? How do we make this better?

-w


- Forwarded message from Friedrich Lindenberg friedr...@pudo.org -

From: Friedrich Lindenberg friedr...@pudo.org
Date: Wed, 24 Nov 2010 11:56:20 +0100
Message-Id: a9089567-6107-4b43-b442-d09dcc0c3...@pudo.org
To: wdmmg-discuss wdmmg-disc...@lists.okfn.org
Subject: [wdmmg-discuss] Failed to port datastore to RDF, will go Mongo

(reposting to list):

Hi all, 

As an action from OGDCamp, Rufus and I agreed that we should resume porting 
WDMMG to RDF in order to make the data model more flexible and to allow a 
merger between WDMMG, OffenerHaushalt and similar other projects. 

After a few days, I'm now over the whole idea of porting WDMMG to RDF. Having 
written a long technical pro/con email before (that I assume contained nothing 
you don't already know), I think the net effect of using RDF would be the 
following: 

* Lots of coolness, sucking up to linked data people.
* Further research regarding knowledge representation.

vs.

* Unstable and outdated technological base. No triplestore I have seen so far 
seemed on par with MySQL 4. 
* No freedom wrt to schema, instead modelling overhead. Spent 30 minutes trying 
to find a predicate for Euro.
* Scares off developers. Invested 2 days researching this, which is how long it 
took me to implement OHs backend the first time around. Project would need to 
be sustained through linked data grad students.
* Less flexibility wrt to analytics, querying and aggregation. SPARQL not so 
hot.
* Good chance of chewing up the UI, much harder to implement editing.

I normally enjoy learning new stuff. This is just painful. Most of the above 
points are probably based on my ignorance, but it really shouldn't take a PhD 
to process some gov spending tables. 

I'll now start a mongo effort because I really think this should go schema-free 
+ I want to get stuff moving. If you can hold off loading Uganda and Israel for 
a week that would of course be very cool, we could then try to evaluate how far 
this went. Progress will be at: http://bitbucket.org/pudo/wdmmg-core 

Friedrich



___
wdmmg-discuss mailing list
wdmmg-disc...@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/wdmmg-discuss

- End forwarded message -

-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664



Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread William Waites
... on the plus side, Friedrich wrote:

] * Lots of coolness, sucking up to linked data people.

I don't see these as particularly good things in themselves. The
solutions have to be obviously technically sound and convenient to
use. Drinking the kool-aid is not helpful.

* [2010-11-24 08:05:08 -0500] Kingsley Idehen kide...@openlinksw.com écrit:
] 
] Is your data available as a dump?

UK data for 2009 that I made is available at:

   http://semantic.ckan.net/dataset/cra/2009/dump.nt.bz2
   http://semantic.ckan.net/dataset/cra/2009/dump.nq.bz2

But this was done more or less by hand and repurposing the CSV -
SDMX (this was done before QB became best practice)  scripts is not
easy. Still, from a modeling perspective they might be a good starting
point.

But having to ask a question in the right place and the answer being a
good starting point is maybe different from doing a google search and
finding easy to follow recipes that can immediately plugged into some
web app.

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664



Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread Aldo Bucchi
Hi William, Friederich.

This is an excellent email. My replies inlined. Hope I can help.

On Wed, Nov 24, 2010 at 9:47 AM, William Waites w...@styx.org wrote:
 Friedrich, I'm forwarding your message to one of the W3 lists.

 Some of your questions could be easily answered (e.g. for euro in your
 context, you don't have a predicate for that, you have an Observation
 with units of a currency and you could take the currency from
 dbpedia, the predicate is units).

 But I think your concerns are quite valid generally and your
 experience reflects that of most web site developers that encounter
 RDF.

 LOD list, Friedrich is a clueful developer, responsible for
 http://bund.offenerhaushalt.de/ amongst other things. What can we
 learn from this? How do we make this better?

 -w


 - Forwarded message from Friedrich Lindenberg friedr...@pudo.org -

 From: Friedrich Lindenberg friedr...@pudo.org
 Date: Wed, 24 Nov 2010 11:56:20 +0100
 Message-Id: a9089567-6107-4b43-b442-d09dcc0c3...@pudo.org
 To: wdmmg-discuss wdmmg-disc...@lists.okfn.org
 Subject: [wdmmg-discuss] Failed to port datastore to RDF, will go Mongo

 (reposting to list):

 Hi all,

 As an action from OGDCamp, Rufus and I agreed that we should resume porting 
 WDMMG to RDF in order to make the data model more flexible and to allow a 
 merger between WDMMG, OffenerHaushalt and similar other projects.

 After a few days, I'm now over the whole idea of porting WDMMG to RDF. Having 
 written a long technical pro/con email before (that I assume contained 
 nothing you don't already know), I think the net effect of using RDF would be 
 the following:

 * Lots of coolness, sucking up to linked data people.
 * Further research regarding knowledge representation.

I will quickly outline some points that I think are advantages from a
developer POV. ( once you tackle the problems you outline below, of
course ).
* A highly expressive language ( SPARQL )
* Ease of creating workflows where data moves from one app to another.
And this is not just buzz. The self-contained nature of triples and
IDs make it so that you can SPARQL select on one side and SPARQL
insert on another. I do this all the time, creating data pipelines.
I admit it has taken some time to master, but I can peform magic
from my customer's point of view.


 vs.

 * Unstable and outdated technological base. No triplestore I have seen so far 
 seemed on par with MySQL 4.

* You definitely need to give Virtuoso a try. It is a mature SQL
database that grew into RDF. I Strongly disagree with this point as I
have personally created highly demanding projects for large companies
using Virtuoso's Quad Store. To give you a real life case, the recent
Brazilian Election portal by Globo.com (
http://g1.globo.com/especiais/eleicoes-2010/ ) has Virtuoso under the
hood and, being a highly important, mission critical app in a major (
4th ) media company  it is not a toy application.
I know many others but in this one I participated so I can tell you it
is Virtuoso w/o fear mistake.

 * No freedom wrt to schema, instead modelling overhead. Spent 30 minutes 
 trying to find a predicate for Euro.

Yes!
This is a major problem and we as a community need to tackle it.
I am intrigued to see what ideas come up in this thread. Thanks for
bringing it up.

As an alternative, you can initially model everything using a simple
urn:foo:xxx or http://mydomain.com/id/xxx schema ( this is what I do )
and as you move fwd you can refactor the model. Or not.

You can leave it as is and it will still be integratable ( able to
live along other datasets in the same store ).

Deploying the Linked part of Linked Data ( the dereferencing
protocols ) later on is another game.

 * Scares off developers. Invested 2 days researching this, which is how long 
 it took me to implement OHs backend the first time around. Project would need 
 to be sustained through linked data grad students.
 * Less flexibility wrt to analytics, querying and aggregation. SPARQL not so 
 hot.

Did you try Virtuoso? Seriously.
It provides out of the box common aggregates and is highly extensible.
You basically have a development platform at your disposal.

 * Good chance of chewing up the UI, much harder to implement editing.

Definitely hard. This is something I hope will be alleviated once we
start getting more demos into the wild. But, take note: the Active
Record + MVC pattern works. This is not as alien as it seems.

Also, SPARQL also removes the joines as some of the major NoSQL
offerings do. I find it terribly easy to create UIs over RDF, but I
have been doing it for a while already.


 I normally enjoy learning new stuff. This is just painful. Most of the above 
 points are probably based on my ignorance, but it really shouldn't take a PhD 
 to process some gov spending tables.

 I'll now start a mongo effort because I really think this should go 
 schema-free + I want to get stuff moving. If you can hold off loading Uganda 
 and Israel for a week that 

Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread Aldo Bucchi
Sorry, I forgot to add something critical.

Ease of integration ( moving triples ) is just the beginning. Once you
get a hold on the power of ontologies and inference as views your
data starts becoming more and more useful.

But the first step is getting your data into RDF and the return on
that investment is SPARQL and the ease to integrate.

I usually end up with several transformation pipelines and accesory
TTL files which get all combined into one dataset. TTLs are easily
editable by hand, collaboratively versiones, while giving you full
expressivity.

TTL files alone are why some developers fall in love with Linked Data.


On Wed, Nov 24, 2010 at 10:33 AM, Aldo Bucchi aldo.buc...@gmail.com wrote:
 Hi William, Friederich.

 This is an excellent email. My replies inlined. Hope I can help.

 On Wed, Nov 24, 2010 at 9:47 AM, William Waites w...@styx.org wrote:
 Friedrich, I'm forwarding your message to one of the W3 lists.

 Some of your questions could be easily answered (e.g. for euro in your
 context, you don't have a predicate for that, you have an Observation
 with units of a currency and you could take the currency from
 dbpedia, the predicate is units).

 But I think your concerns are quite valid generally and your
 experience reflects that of most web site developers that encounter
 RDF.

 LOD list, Friedrich is a clueful developer, responsible for
 http://bund.offenerhaushalt.de/ amongst other things. What can we
 learn from this? How do we make this better?

 -w


 - Forwarded message from Friedrich Lindenberg friedr...@pudo.org -

 From: Friedrich Lindenberg friedr...@pudo.org
 Date: Wed, 24 Nov 2010 11:56:20 +0100
 Message-Id: a9089567-6107-4b43-b442-d09dcc0c3...@pudo.org
 To: wdmmg-discuss wdmmg-disc...@lists.okfn.org
 Subject: [wdmmg-discuss] Failed to port datastore to RDF, will go Mongo

 (reposting to list):

 Hi all,

 As an action from OGDCamp, Rufus and I agreed that we should resume porting 
 WDMMG to RDF in order to make the data model more flexible and to allow a 
 merger between WDMMG, OffenerHaushalt and similar other projects.

 After a few days, I'm now over the whole idea of porting WDMMG to RDF. 
 Having written a long technical pro/con email before (that I assume 
 contained nothing you don't already know), I think the net effect of using 
 RDF would be the following:

 * Lots of coolness, sucking up to linked data people.
 * Further research regarding knowledge representation.

 I will quickly outline some points that I think are advantages from a
 developer POV. ( once you tackle the problems you outline below, of
 course ).
 * A highly expressive language ( SPARQL )
 * Ease of creating workflows where data moves from one app to another.
 And this is not just buzz. The self-contained nature of triples and
 IDs make it so that you can SPARQL select on one side and SPARQL
 insert on another. I do this all the time, creating data pipelines.
 I admit it has taken some time to master, but I can peform magic
 from my customer's point of view.


 vs.

 * Unstable and outdated technological base. No triplestore I have seen so 
 far seemed on par with MySQL 4.

 * You definitely need to give Virtuoso a try. It is a mature SQL
 database that grew into RDF. I Strongly disagree with this point as I
 have personally created highly demanding projects for large companies
 using Virtuoso's Quad Store. To give you a real life case, the recent
 Brazilian Election portal by Globo.com (
 http://g1.globo.com/especiais/eleicoes-2010/ ) has Virtuoso under the
 hood and, being a highly important, mission critical app in a major (
 4th ) media company  it is not a toy application.
 I know many others but in this one I participated so I can tell you it
 is Virtuoso w/o fear mistake.

 * No freedom wrt to schema, instead modelling overhead. Spent 30 minutes 
 trying to find a predicate for Euro.

 Yes!
 This is a major problem and we as a community need to tackle it.
 I am intrigued to see what ideas come up in this thread. Thanks for
 bringing it up.

 As an alternative, you can initially model everything using a simple
 urn:foo:xxx or http://mydomain.com/id/xxx schema ( this is what I do )
 and as you move fwd you can refactor the model. Or not.

 You can leave it as is and it will still be integratable ( able to
 live along other datasets in the same store ).

 Deploying the Linked part of Linked Data ( the dereferencing
 protocols ) later on is another game.

 * Scares off developers. Invested 2 days researching this, which is how long 
 it took me to implement OHs backend the first time around. Project would 
 need to be sustained through linked data grad students.
 * Less flexibility wrt to analytics, querying and aggregation. SPARQL not so 
 hot.

 Did you try Virtuoso? Seriously.
 It provides out of the box common aggregates and is highly extensible.
 You basically have a development platform at your disposal.

 * Good chance of chewing up the UI, much harder to implement editing.

 

Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread Kingsley Idehen

On 11/24/10 8:29 AM, William Waites wrote:

... on the plus side, Friedrich wrote:

] * Lots of coolness, sucking up to linked data people.

I don't see these as particularly good things in themselves. The
solutions have to be obviously technically sound and convenient to
use. Drinking the kool-aid is not helpful.

* [2010-11-24 08:05:08 -0500] Kingsley Idehenkide...@openlinksw.com  écrit:
]
] Is your data available as a dump?

UK data for 2009 that I made is available at:

http://semantic.ckan.net/dataset/cra/2009/dump.nt.bz2
http://semantic.ckan.net/dataset/cra/2009/dump.nq.bz2

But this was done more or less by hand and repurposing the CSV -
SDMX (this was done before QB became best practice)  scripts is not
easy. Still, from a modeling perspective they might be a good starting
point.

But having to ask a question in the right place and the answer being a
good starting point is maybe different from doing a google search and
finding easy to follow recipes that can immediately plugged into some
web app.

Cheers,
-w

William,

What does MySQL 4 do with this data that can't be done with a moderately 
capable RDF quad / triplestore?


If I am going to run rings around this thing, I need a starting point :-)

--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen







Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread Ben O'Steen
On Wed, 2010-11-24 at 12:51 -0500, Kingsley Idehen wrote:
 What does MySQL 4 do with this data that can't be done with a
 moderately capable RDF quad / triplestore? 
 
 If I am going to run rings around this thing, I need a starting
 point :-)

That's not the point that is being made. A competent developer, using
all the available links and documentation, spending days researching and
learning and trying to implement, is unable to make an app using a
triplestore that is on a par with one they can create very quickly using
a relational database.

This is about the 1000th time I have heard this story, and the ability
range of those saying the same thing is huge - from 9-5 devs who learn
what they need to people who research and teach artificial intelligence
and other cutting edge areas and who actively learn new, complex skills
just because they can.

The point is not whether someone who (co?)developed the virtuoso
triplestore can make RDF work, it's whether someone with the time,
current documentation and inclination can.


Ben


 
 -- 
 
 Regards,
 
 Kingsley Idehen 
 President  CEO 
 OpenLink Software 
 Web: http://www.openlinksw.com
 Weblog: http://www.openlinksw.com/blog/~kidehen
 Twitter/Identi.ca: kidehen 
 
 
 
 





Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread Juan Sequeda
Ben,

Just like we can

1) download xampp and install php, apache, mysql with one click
2) open a browser, open phpmyadmin, create my db
3) copy paste any snippet of code I can find on the web about connecting
php/java etc to a mysql database
4) write code to select/insert/update my db

... you are asking for these same 4 simple steps but for an RDF database?

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Wed, Nov 24, 2010 at 12:12 PM, Ben O'Steen bost...@gmail.com wrote:

 On Wed, 2010-11-24 at 12:51 -0500, Kingsley Idehen wrote:
  What does MySQL 4 do with this data that can't be done with a
  moderately capable RDF quad / triplestore?
 
  If I am going to run rings around this thing, I need a starting
  point :-)

 That's not the point that is being made. A competent developer, using
 all the available links and documentation, spending days researching and
 learning and trying to implement, is unable to make an app using a
 triplestore that is on a par with one they can create very quickly using
 a relational database.

 This is about the 1000th time I have heard this story, and the ability
 range of those saying the same thing is huge - from 9-5 devs who learn
 what they need to people who research and teach artificial intelligence
 and other cutting edge areas and who actively learn new, complex skills
 just because they can.

 The point is not whether someone who (co?)developed the virtuoso
 triplestore can make RDF work, it's whether someone with the time,
 current documentation and inclination can.


 Ben


 
  --
 
  Regards,
 
  Kingsley Idehen
  President  CEO
  OpenLink Software
  Web: http://www.openlinksw.com
  Weblog: http://www.openlinksw.com/blog/~kidehen
  Twitter/Identi.ca: kidehen
 
 
 
 






Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread Ben O'Steen
On Wed, 2010-11-24 at 14:40 -0600, Juan Sequeda wrote:
 Ben,
 
 
 Just like we can
 
 
 1) download xampp and install php, apache, mysql with one click
 2) open a browser, open phpmyadmin, create my db
 3) copy paste any snippet of code I can find on the web about
 connecting php/java etc to a mysql database
 4) write code to select/insert/update my db
 
 
 ... you are asking for these same 4 simple steps but for an RDF
 database?

Not me personally, but in my experience of talking to developers in the
HE/FE sector as well as commercial devs through JISC, running Dev8D and
so on, being able to achieve those steps in the manner you have
suggested is crucial.

Yes, I exaggerated about my hearing the same tale a thousand times, but
I have heard that perception of RDF/triplestores many, many times as
unfounded as some may argue it is.

This will sound like heresy, but the closest parallel I've found to step
1) is with mulgara (excepting that a Java runtime of some sort is
required.) Run the jar, open browser, and run through the web-based
examples that cover input, update and query.

Ben


 
 Juan Sequeda
 +1-575-SEQ-UEDA
 www.juansequeda.com
 
 
 On Wed, Nov 24, 2010 at 12:12 PM, Ben O'Steen bost...@gmail.com
 wrote:
 On Wed, 2010-11-24 at 12:51 -0500, Kingsley Idehen wrote:
  What does MySQL 4 do with this data that can't be done with
 a
  moderately capable RDF quad / triplestore?
 
  If I am going to run rings around this thing, I need a
 starting
  point :-)
 
 
 That's not the point that is being made. A competent
 developer, using
 all the available links and documentation, spending days
 researching and
 learning and trying to implement, is unable to make an app
 using a
 triplestore that is on a par with one they can create very
 quickly using
 a relational database.
 
 This is about the 1000th time I have heard this story, and the
 ability
 range of those saying the same thing is huge - from 9-5 devs
 who learn
 what they need to people who research and teach artificial
 intelligence
 and other cutting edge areas and who actively learn new,
 complex skills
 just because they can.
 
 The point is not whether someone who (co?)developed the
 virtuoso
 triplestore can make RDF work, it's whether someone with the
 time,
 current documentation and inclination can.
 
 
 Ben
 
 
 
 
  --
 
  Regards,
 
  Kingsley Idehen
  President  CEO
  OpenLink Software
  Web: http://www.openlinksw.com
  Weblog: http://www.openlinksw.com/blog/~kidehen
  Twitter/Identi.ca: kidehen
 
 
 
 
 
 
 
 
 
 





Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread Kingsley Idehen

On 11/24/10 3:57 PM, Ben O'Steen wrote:

On Wed, 2010-11-24 at 14:40 -0600, Juan Sequeda wrote:

Ben,


Just like we can


1) download xampp and install php, apache, mysql with one click
2) open a browser, open phpmyadmin, create my db
3) copy paste any snippet of code I can find on the web about
connecting php/java etc to a mysql database
4) write code to select/insert/update my db


... you are asking for these same 4 simple steps but for an RDF
database?

Not me personally, but in my experience of talking to developers in the
HE/FE sector as well as commercial devs through JISC, running Dev8D and
so on, being able to achieve those steps in the manner you have
suggested is crucial.

Yes, I exaggerated about my hearing the same tale a thousand times, but
I have heard that perception of RDF/triplestores many, many times as
unfounded as some may argue it is.

This will sound like heresy, but the closest parallel I've found to step
1) is with mulgara (excepting that a Java runtime of some sort is
required.) Run the jar, open browser, and run through the web-based
examples that cover input, update and query.

Ben


You should make a basic RDBMS your yardstick e.g. FoxPRO, Access, 
Filemaker, SQL Server etc..


If you have enterprise DBMS experience then: Oracle, SQL Server, Sybase, 
Ingres, Informix, Progress (OpenEdge), Firebird (once Interbase), 
PostgreSQL, MySQL.


In all cases, this is what developers do:

1. Install DBMS
2. Load (using various data loaders and import utilities) and Create Data
3. Create Views and Queries
4. Put Forms and Reports atop Views (or Tables)
5. Enjoy power of RDBMS apps.

This is what end-users (basic or power users do):

1. Get a productivity tool of choice (Word Processor, Spreadsheet, 
Report Writer etc)
2. Connect to RDBMS via an ODBC Data Source Name (which via ODBC Driver 
Manager is bound to Drivers for each DBMS)

3. Enjoys power of RDBMS via their preferred Desktop tool.

Here is what so called Web Developers do:

1. Find an Open Source DBMS
2. Compile it
3. Work through LAMP stack to PHP, Pyton, Ruby, TCL, others
4. Ignore DBMS independent API of ODBC (available via iODBC or unixODBC 
effort) and couple HTML pages directly to DBMS
5. Ignore DBMS for user account management and stick that in HTML page 
layer too.



Irrespective of where you fit in re. the above, this is what you should 
be able to do with Relational Property Graph Databases that support 
resolvable URIs as Unique Keys:


1. Load data - there are a myriad of paths including transient and 
materialized views over ODBC or JDBC accessible RDBMS data sources, Web 
Services, many other data container formats (spreadsheets, CSV files, etc..)
2. Use HTML+RDFa (or basic HTML) pages as Forms and Report Writer tool 
re. data browsing

3. Enjoy power of Linked Data.

Note re. above:

1. No re-write rules coding
2. No 303 debates re. how to make Unique Keys resolve
3. No exposure to Name | Address disambiguation re. Unique Keys .

Mulgara already mandates Java. Java != Platform Independent either. I am 
mandating nothing bar installing a DBMS and then simply leveraging HTTP, 
EAV Data Model, and the power of a Relational Property Graph Database 
that may or may not provide output in RDF format.



Kingsley



Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Wed, Nov 24, 2010 at 12:12 PM, Ben O'Steenbost...@gmail.com
wrote:
 On Wed, 2010-11-24 at 12:51 -0500, Kingsley Idehen wrote:
   What does MySQL 4 do with this data that can't be done with
 a
   moderately capable RDF quad / triplestore?
 
   If I am going to run rings around this thing, I need a
 starting
   point :-)


 That's not the point that is being made. A competent
 developer, using
 all the available links and documentation, spending days
 researching and
 learning and trying to implement, is unable to make an app
 using a
 triplestore that is on a par with one they can create very
 quickly using
 a relational database.

 This is about the 1000th time I have heard this story, and the
 ability
 range of those saying the same thing is huge - from 9-5 devs
 who learn
 what they need to people who research and teach artificial
 intelligence
 and other cutting edge areas and who actively learn new,
 complex skills
 just because they can.

 The point is not whether someone who (co?)developed the
 virtuoso
 triplestore can make RDF work, it's whether someone with the
 time,
 current documentation and inclination can.


 Ben



 
   --
 
   Regards,
 
   Kingsley Idehen
   President  CEO
   OpenLink Software
   Web: http://www.openlinksw.com
   Weblog: http://www.openlinksw.com/blog/~kidehen
   Twitter/Identi.ca: