Re: Resolving DC PURLs

2010-03-29 Thread Peter Ansell
On 30 March 2010 09:22, David Wood  wrote:
> Hi Tom,
>
> On Friday, 26 March, Ian Davis (CTO of Talis) reported that purl.org was 
> rejecting some requests to Dublin Core PURLs [1].  I asked OCLC to increase 
> the number of threads used by their PURL server and the maximum number of 
> concurrent connections.  They complied this afternoon.
>
> At the time of the failures, OCLC reported that purl.org was rejecting 
> between 20 and 60 requests per second for DC terms [2].
>
> It would seem that Linked Data clients are becoming more prevalent and that 
> some of them are not particularly well behaved.  Perhaps we need to raise 
> awareness of the importance of caching.
>
> This incident points out the criticality of DC terms to the Linked Data 
> community and the fragility of a single point of failure such as purl.org.  
> The PURL Federation development recently announced by NCBO and Zepheira may 
> eventually serve to remove the single point of failure, but the criticality 
> of service is likely to get worse with time.
>
> This message is simply an advisory to the DC community of some practical 
> issues arising from the use of DC terms by the Linked Data community and 
> requires no immediate action.  I would ask, though, that awareness of these 
> issues be kept in mind as Dublin Core considers the management of its 
> identifiers.
>
> [1]  http://twitter.com/IanD
> [2]  Personal correspondence from Tom Dehn at OCLC
> [3]  http://zepheira.com/publications/news/#PURLFederationDevelopment
>

For popular ontologies, Linked Data browsers could be distributed with
copies of this information embedded so they basically never need to
retrieve it on the fly. Ontologies aren't the only place that this
issue will come from, as Linked Data is designed to distribute
information at the lowest granularity possible, thus necessarily
increasing the bandwidth required to transport a number of pieces of
information related to different things.

Redirection services are the lowest bandwidth and processing services
in the Linked Data chain. What happens when one of the actual data
services experiences this level of popularity?

Cheers,

Peter



Re: Resolving DC PURLs

2010-03-29 Thread Nathan
David Wood wrote:
> Hi Tom,
> 
> On Friday, 26 March, Ian Davis (CTO of Talis) reported that purl.org was 
> rejecting some requests to Dublin Core PURLs [1].  I asked OCLC to increase 
> the number of threads used by their PURL server and the maximum number of 
> concurrent connections.  They complied this afternoon.
> 
> At the time of the failures, OCLC reported that purl.org was rejecting 
> between 20 and 60 requests per second for DC terms [2].
> 
> It would seem that Linked Data clients are becoming more prevalent and that 
> some of them are not particularly well behaved.  Perhaps we need to raise 
> awareness of the importance of caching.

This is a big issue in general with linked data (specifically when
generated by scripts). Caching is at an all time low and it is having
serious effects.

One particular suspect is PHP (and thus the scripts created with it)
which has little to no support of anything http related let alone caching.

Raised awareness all round would be most beneficial to all.

Regards!

> This incident points out the criticality of DC terms to the Linked Data 
> community and the fragility of a single point of failure such as purl.org.  
> The PURL Federation development recently announced by NCBO and Zepheira may 
> eventually serve to remove the single point of failure, but the criticality 
> of service is likely to get worse with time.
> 
> This message is simply an advisory to the DC community of some practical 
> issues arising from the use of DC terms by the Linked Data community and 
> requires no immediate action.  I would ask, though, that awareness of these 
> issues be kept in mind as Dublin Core considers the management of its 
> identifiers.
> 
> [1]  http://twitter.com/IanD
> [2]  Personal correspondence from Tom Dehn at OCLC
> [3]  http://zepheira.com/publications/news/#PURLFederationDevelopment
> 
> 
> Regards,
> Dave
> --
> David Wood, Ph.D.
> Partner
> Zepheira - The Art of Data
> http://zepheira.com/team/dave/



Resolving DC PURLs

2010-03-29 Thread David Wood
Hi Tom,

On Friday, 26 March, Ian Davis (CTO of Talis) reported that purl.org was 
rejecting some requests to Dublin Core PURLs [1].  I asked OCLC to increase the 
number of threads used by their PURL server and the maximum number of 
concurrent connections.  They complied this afternoon.

At the time of the failures, OCLC reported that purl.org was rejecting between 
20 and 60 requests per second for DC terms [2].

It would seem that Linked Data clients are becoming more prevalent and that 
some of them are not particularly well behaved.  Perhaps we need to raise 
awareness of the importance of caching.

This incident points out the criticality of DC terms to the Linked Data 
community and the fragility of a single point of failure such as purl.org.  The 
PURL Federation development recently announced by NCBO and Zepheira may 
eventually serve to remove the single point of failure, but the criticality of 
service is likely to get worse with time.

This message is simply an advisory to the DC community of some practical issues 
arising from the use of DC terms by the Linked Data community and requires no 
immediate action.  I would ask, though, that awareness of these issues be kept 
in mind as Dublin Core considers the management of its identifiers.

[1]  http://twitter.com/IanD
[2]  Personal correspondence from Tom Dehn at OCLC
[3]  http://zepheira.com/publications/news/#PURLFederationDevelopment


Regards,
Dave
--
David Wood, Ph.D.
Partner
Zepheira - The Art of Data
http://zepheira.com/team/dave/









Re: Nice Data Cleansing Tool Demo

2010-03-29 Thread David Huynh

Hi Aldo,

On Mar/30/10 1:46 am, Aldo Bucchi wrote:

Hi David,

I love it and I NEED it ;)
Awesome work, really.

I heard it will be opensource so I will probably be able to extend it
myself,
Yup, it'll be open source. Clean data sets are all clean the same way, 
but each dirty data set is dirty in its own way. Which is why Gridworks 
needs all the open source contributions in order to cover as many 
different kinds of data dirtiness as possible. :-)



but here are some ideas for (missing?) features:
* Importing custom Lookups/Dictionaries ( to go from text to IDs or
the other way around ). Maybe this is possible using a different hook
for the reconciliation mechanism.
* Related: Plug in other reconciliation services ( not sure how this
stands up to freebase biz alignment )
   
Definitely. Right now Gridworks is hooked up to 2 services: the Freebase 
text search service (called "relevance") and the experimental proper 
reconciliation service. It makes sense to be able to plug in other 
services as well.



* Command line engine. To add a GW project as a step in a traditional
transformation job and execute steps sequentially.
   
We've thought of that, too, but haven't implemented it. That shouldn't 
be too hard.



* Expose Gazetteers ( dictionaries ) generated within the tool ( when
equating facets )
   

That makes sense. I'll think more about how to support that.

David




Re: Nice Data Cleansing Tool Demo

2010-03-29 Thread David Huynh

On Mar/29/10 9:10 pm, François Scharffe wrote:

Hi David,

Great work !

When will the tool will be released ? I can't wait trying it.

Hi François, we're aiming for about 1 more month of development and testing.

David




Re: AW: Contd: Nice Data Cleansing Tool Demo

2010-03-29 Thread Kingsley Idehen

Peter Haase wrote:

Hi,

  

[SNIP]

<<
What is needed is Top-k plus the right pivot/refinement
operators (which link to new dynamic collections).




Yes, and I am sure you know that the above isn't in anyway 
insurmountable (for Microsoft to implement) bearing in mind the server 
simply has to handle the URL requests it receives as part of its dynamic 
collection assembly process.


Pivot is a game changing client for Linked Data (RDF or OData variants), 
it simply makes life a lot easier for people to comprehend the virtues 
of Faceted Search & Find courtesy of EAV graph models.





--

Regards,

Kingsley Idehen	  
President & CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









Re: Contd: Nice Data Cleansing Tool Demo

2010-03-29 Thread Kingsley Idehen

Georgi Kobilarov wrote:

Kingsley,

  

So by the time you can
use Pivot on SW/linked data, you will already have solved all the
interesting and challenging problems.

  

This part is what I call an innovation slot since we have hooked it
into



our

  

DBMS hosted faceted engine and successfully used it over very large


data
  

sets.


Kingsley, I'm wondering: How did you do that? I tried it myself, and
it doesn't work.
  

Did I indicate that my demo instance was public? How did you come to
overlook that?



I wasn't referring to a demo of yours, but to the general task of using
Pivot as a frontend to a faceted browsing backend engine. 
  

Re. the general task, it can compliment a back-end.

Have you ever encountered an old concept, from the tabular data 
representation realm (e.g., RDBMS) called "Mirrored Cursors" ? Maybe 
you've encountered "Detached Rowsets" and schemes that also include 
delta handling between the client and the server.


The fundamental point I am making to you is simply this: Pivot is a 
powerful compliment to an HTTP server that can deliver faceted 
navigation, natively (like Virtuoso). The end result is this: you can 
get the server the do some work (localize the first phase of the Faceted 
Search and Find against massive data corpus) and then have the client 
handle the remainder (nice Visual UX for insight discovery).


  

Pivot can't make use of server-side faceted browsing engines.

  

Why do you speculate? You are incorrect and Virtuoso *doing* what you
claim is impossible will be emphatic proof, nice and simple.

Pivot consumes data from HTTP accessible collections (which may be static


or
  

dynamic [1]). A dynamic collection is comprised of CXML resources


(basically
  

XML) .



I don't speculate. Which parts of my "does not work" and "can't use" did
sound like a speculation?  

  
You explicitly said: "Pivot can't make use of server-side faceted 
browsing engines" .


I am saying, based on my earlier comments (clarified further above re. 
mirrored cursor anecdote): It can, will, and you shall see re. Virtuoso.


 
  

You need to send *all* the data to the Pivot client, and it computes
the facets and performs any filtering operation client-side.
  

You make a collection from a huge corpus of data (what I demonstrate) then
you "Save As" (which I demonstrate as the generation point re. CXML
resource) and then Pivot consumes. All the data is Virtuoso hosted.

There are two things you are overlooking:

1. The dynamic collection is produced at the conclusion of Virtuoso based
faceted navigation (the interactions basically describes the Facet
membership to Virtuoso) 2. Pivot works with static and dynamic collections


.
  

*I specifically state, this is about using both products together to solve


a
  

major problem. #1 Faceted Browsing UX #2 Faceting over a huge data
corpus.*

Virtuoso is an HTTP server, it can serve a myriad of representations of


data to
  

user agents (it has its own DBMS hosted XSLT Processor and XML Schema
Validator with XQuery/XPath to boot, all very old stuff).



Yes, you make a collection and "save as" that to CXML, exactly! That is not
"using Pivot as a frontend to Virtuoso". 

I am starting from the Server not the Client.

I am starting from the Server because the Client can't handle the data 
corpus, and wasn't built with that in mind. It was build to consume a 
specific type of resource collection (static or dynamic) via HTTP end of 
story.


Where I start from doesn't invalidate Pivot as a front-end to Virtuoso, 
the entire operation can take place within the  "Pivot Browser" (Pivot 
is an HTTP user agent that operates on a specific data representation 
format).

Sure, you can construct a small
dataset from a huge dataset using SPARQL, or your Virtuoso facet engine or
whatever. And then export that resulting dataset to Pivot collection XML and
load that CXML into Pivot. 
I am not talking about "Export" in the manner you characterize. I am 
talking about an HTTP conversation that results in CXML based resource 
being dispatched from a Server to a User Agent, REST-fully.




But that is very different to using Pivot as a
frontend to a huge data set. 
  

In your world view and eyes, maybe. Absolutely not the case in mine.

I can interact with Virtuoso from start to finish from within Pivot 
(never leaving Pivot). I start by making HTTP requests from Pivot, and 
the entire exercise concludes with an CXML representation of the 
collection assembled by Virtuoso (dynamically).




  

BTW -- how do you think Peter Haase got his variant working? I am sure he
will shed identical light on the matter for you.



Yes, Peter, please do. From what I saw in the Fluidops demo, it works
exactly as I wrote above: A sparql-query constructs a small dataset from the
sparql endpoint, converts that via a proxy to CXML and loads it into Pivot. 


I don't say Pivot d

AW: Contd: Nice Data Cleansing Tool Demo

2010-03-29 Thread Peter Haase
Hi,

> -Ursprüngliche Nachricht-
> Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im
> Auftrag von Kingsley Idehen
> Gesendet: Monday, March 29, 2010 8:27 PM
> An: public-lod@w3.org; Georgi Kobilarov
> Betreff: Contd: Nice Data Cleansing Tool Demo
> 
> Georgi Kobilarov wrote:
> > Hello,
> >
> >
>  Now here is the obvious question, re. broader realm of faceted
> data
>  navigation, have you guys digested the underlying concepts
>  demonstrated by Microsoft Pivot?
> 
> 
> >>> I've seen the TED talk on Pivot. It's a very well polished
> >>> implementation of faceted browsing. The Seadragon technology
> >>> integration and animations are well executed. As far as "underlying
> >>> concepts" in faceted browsing go, I haven't noticed anything novel
> >>>
> > there.
> >
> > I agree with David here, nothing novel about the underlying concept.
> > One thing I found quite nice and haven't seen before is grouping
> results
> > along one facet dimension (the bar-graph representation of results).
> I
> > think
> > that is a neat idea.
> > The integration of Seadragon and deep-zooming looks nice, but little
> more
> > than that. Not all objects render into nice pictures, and the
> > interaction of zooming in
> > and out isn't a helpful one in my opinion. The zooming gives the
> > impression
> > at first that the position of objects in that 2D space is meaningful,
> > but it
> > is not.  It's an eye-catcher, not more.
> >
> >
> >
> >>> One thing to note: in each Pivot demo example, there is data of
> >>> exactly one type only--say, type people. So it seems, using
> Microsoft
> >>> Pivot, you can't pivot from one type to another, say, from people
> to
> >>> their companies. You can't do that example I used for Parallax: US
> >>> presidents -> children -> schools. Or skyscrapers -> architects ->
> >>> other buildings. So from what I've seen, as it currently is,
> Microsoft
> >>> Pivot cannot be used for browsing graphs because it cannot pivot
> (over
> >>> graph links).
> >>>
> >> Yes, this is a limitation re. general faceted browsing concepts.
> >>
> >
> > No, it's a limitation of the current implementations of faceted
> browsing.
> > Not a general problem with faceted browsing.
> >

Using dynamic collection you can essentially implement any pivot/query
refinement/filter operator you like, including the ones mentioned above.
It is true that the demo collections from Microsoft do not show this (yet),
but we have some of them in our system at
http://iwb.fluidops.com/pivot


> >
> >> The most interesting part to me is the use of an alternative symbol
> >> mechanism for the human interaction aspect i.e., deep zoom images
> where
> >> you would typically see a long human unfriendly URI.
> >>
> >
> > "Where you would typically see URIs"? Really?
> 
> **clean up post re. some critical typos **
> 
> Where would you see URIs? What do you see when you use:
> http://lod.openlinksw.com ?
> 
> And when you don't see URIs (human or machine, the typical case re.
> Faceted Browsing over RDF) what do you have re. HTTP based Linked Data?
> Zilch!
> >
> >
> >>> Furthermore, I believe that to get Pivot to perform well, you need
> a
> >>> cleaned up, *homogeneous* data set, presumably of small size (see
> >>> their Wikipedia example in which they picked only the top 500 most
> >>> visited articles). SW/linked data in their natural habitat,
> however,
> >>> is rarely that cleaned up and homogeneous ...

Yes, ideally you have clean homogeneous data. However, in our demonstrator
we do operate on a larger, un-cleaned LOD data set, incl. DBpedia (>3Mio
entities) and several others (around 200Mio triples in total). Clearly, you
see the problems in the data (missing images, wrong images, duplicate
values, ...) Still, I see it from a positive side: I believe that for many
information needs, visual exploration is a very effective paradigm, and with
such a great tool like Pivot one can achieve a phenomenal user experience.
And it is possible to show that with real LOD data already today. 
As Georgi said, the data quality will improve over time. Visual exploration
tools like Pivot - where you actually *see* the problems - might help on
this front.



> > Is  that really a problem of Linked Data Web as such? I don't think
> so.
> > There is a lot of badly structured, not well cleaned up data on the
> > current
> > Linked Data Web. Because there was so much excitement about
> publishing
> > anything in the early day, and so little attention to the actual data
> > that's
> > getting published. That is going to change.
> >
> >>> So by the time you can
> >>> use Pivot on SW/linked data, you will already have solved all the
> >>> interesting and challenging problems.
> >>>
> >> This part is what I call an innovation slot since we have hooked it
> into
> >>
> > our
> >
> >> DBMS hosted faceted engine and successfully used it over very large
> data
> >> sets.
> >
> > Kingsley, I'm wondering: How did you do that? I t

Re: Contd: Nice Data Cleansing Tool Demo

2010-03-29 Thread Aldo Bucchi
Hi,

On Mon, Mar 29, 2010 at 3:22 PM, Nathan  wrote:
> Georgi Kobilarov wrote:
>> Kingsley,
>>
>> So by the time you can
>> use Pivot on SW/linked data, you will already have solved all the
>> interesting and challenging problems.
>>
> This part is what I call an innovation slot since we have hooked it
> into
>
 our

> DBMS hosted faceted engine and successfully used it over very large
>> data
> sets.
 Kingsley, I'm wondering: How did you do that? I tried it myself, and
 it doesn't work.
>>> Did I indicate that my demo instance was public? How did you come to
>>> overlook that?
>>
>> I wasn't referring to a demo of yours, but to the general task of using
>> Pivot as a frontend to a faceted browsing backend engine.
>>
>>
 Pivot can't make use of server-side faceted browsing engines.

>>> Why do you speculate? You are incorrect and Virtuoso *doing* what you
>>> claim is impossible will be emphatic proof, nice and simple.
>>>
>>> Pivot consumes data from HTTP accessible collections (which may be static
>> or
>>> dynamic [1]). A dynamic collection is comprised of CXML resources
>> (basically
>>> XML) .
>>
>> I don't speculate. Which parts of my "does not work" and "can't use" did
>> sound like a speculation?
>>
>>
 You need to send *all* the data to the Pivot client, and it computes
 the facets and performs any filtering operation client-side.
>>> You make a collection from a huge corpus of data (what I demonstrate) then
>>> you "Save As" (which I demonstrate as the generation point re. CXML
>>> resource) and then Pivot consumes. All the data is Virtuoso hosted.
>>>
>>> There are two things you are overlooking:
>>>
>>> 1. The dynamic collection is produced at the conclusion of Virtuoso based
>>> faceted navigation (the interactions basically describes the Facet
>>> membership to Virtuoso) 2. Pivot works with static and dynamic collections
>> .
>>> *I specifically state, this is about using both products together to solve
>> a
>>> major problem. #1 Faceted Browsing UX #2 Faceting over a huge data
>>> corpus.*
>>>
>>> Virtuoso is an HTTP server, it can serve a myriad of representations of
>> data to
>>> user agents (it has its own DBMS hosted XSLT Processor and XML Schema
>>> Validator with XQuery/XPath to boot, all very old stuff).
>>
>> Yes, you make a collection and "save as" that to CXML, exactly! That is not
>> "using Pivot as a frontend to Virtuoso". Sure, you can construct a small
>> dataset from a huge dataset using SPARQL, or your Virtuoso facet engine or
>> whatever. And then export that resulting dataset to Pivot collection XML and
>> load that CXML into Pivot. But that is very different to using Pivot as a
>> frontend to a huge data set.
>>
>>
>>> BTW -- how do you think Peter Haase got his variant working? I am sure he
>>> will shed identical light on the matter for you.
>>
>> Yes, Peter, please do. From what I saw in the Fluidops demo, it works
>> exactly as I wrote above: A sparql-query constructs a small dataset from the
>> sparql endpoint, converts that via a proxy to CXML and loads it into Pivot.
>>
>> I don't say Pivot doesn't make a nice demo, or a useful tool to explore a
>> small dataset via faceted filtering. But it's not a frontend that can be put
>> on top of a faceted browsing engine like
>> http://developer.nytimes.com/docs/article_search_api
>>
>
> The last thing I want is an argument about this; but surely virtually
> every service in the world; faceted browsing included, works by querying
> a large dataset to get a smaller set of results, transforming it in to a
> the needed format an then displaying? sounds like every system I've ever
> seen from the simple html view of an sql query right up to the mighty
> google itself.
>
> Maybe I'm being naive here; what am I missing?

Nathan,

You're not missing much. From what I see:
Georgi's point is that the level of integration is not ideal. It is
basically a "load" style integration, not a "connect" style
integration.
Kingsley's point is that they "can" be integrated, and he has a demo
to prove it.

Both are right ;)

I can relate to both but I lean towards Kingsley's because he is, as
usual, projecting. He knows that this integration is enough to make a
point, and that the rest will happen.
Show the value! The architecture will follow. ( this is what M$ does
all the time ). Plus they already have a lock-in on the runtime side
and seadragon tech, so I think they can afford to open the platform up
some more on the integration side of things.

Regards,
A

>
> Many Regards,
>
> Nathan
>
>



-- 
Aldo Bucchi
skype:aldo.bucchi
http://www.univrz.com/
http://aldobucchi.com/

PRIVILEGED AND CONFIDENTIAL INFORMATION
This message is only for the use of the individual or entity to which it is
addressed and may contain information that is privileged and confidential. If
you are not the intended recipient, please do not distribute or copy this
communication, by e-mail or otherwise. In

Re: Contd: Nice Data Cleansing Tool Demo

2010-03-29 Thread Nathan
Georgi Kobilarov wrote:
> Kingsley,
> 
> So by the time you can
> use Pivot on SW/linked data, you will already have solved all the
> interesting and challenging problems.
>
 This part is what I call an innovation slot since we have hooked it
 into

>>> our
>>>
 DBMS hosted faceted engine and successfully used it over very large
> data
 sets.
>>> Kingsley, I'm wondering: How did you do that? I tried it myself, and
>>> it doesn't work.
>> Did I indicate that my demo instance was public? How did you come to
>> overlook that?
> 
> I wasn't referring to a demo of yours, but to the general task of using
> Pivot as a frontend to a faceted browsing backend engine. 
> 
> 
>>> Pivot can't make use of server-side faceted browsing engines.
>>>
>> Why do you speculate? You are incorrect and Virtuoso *doing* what you
>> claim is impossible will be emphatic proof, nice and simple.
>>
>> Pivot consumes data from HTTP accessible collections (which may be static
> or
>> dynamic [1]). A dynamic collection is comprised of CXML resources
> (basically
>> XML) .
> 
> I don't speculate. Which parts of my "does not work" and "can't use" did
> sound like a speculation?  
> 
>  
>>> You need to send *all* the data to the Pivot client, and it computes
>>> the facets and performs any filtering operation client-side.
>> You make a collection from a huge corpus of data (what I demonstrate) then
>> you "Save As" (which I demonstrate as the generation point re. CXML
>> resource) and then Pivot consumes. All the data is Virtuoso hosted.
>>
>> There are two things you are overlooking:
>>
>> 1. The dynamic collection is produced at the conclusion of Virtuoso based
>> faceted navigation (the interactions basically describes the Facet
>> membership to Virtuoso) 2. Pivot works with static and dynamic collections
> .
>> *I specifically state, this is about using both products together to solve
> a
>> major problem. #1 Faceted Browsing UX #2 Faceting over a huge data
>> corpus.*
>>
>> Virtuoso is an HTTP server, it can serve a myriad of representations of
> data to
>> user agents (it has its own DBMS hosted XSLT Processor and XML Schema
>> Validator with XQuery/XPath to boot, all very old stuff).
> 
> Yes, you make a collection and "save as" that to CXML, exactly! That is not
> "using Pivot as a frontend to Virtuoso". Sure, you can construct a small
> dataset from a huge dataset using SPARQL, or your Virtuoso facet engine or
> whatever. And then export that resulting dataset to Pivot collection XML and
> load that CXML into Pivot. But that is very different to using Pivot as a
> frontend to a huge data set. 
> 
> 
>> BTW -- how do you think Peter Haase got his variant working? I am sure he
>> will shed identical light on the matter for you.
> 
> Yes, Peter, please do. From what I saw in the Fluidops demo, it works
> exactly as I wrote above: A sparql-query constructs a small dataset from the
> sparql endpoint, converts that via a proxy to CXML and loads it into Pivot. 
> 
> I don't say Pivot doesn't make a nice demo, or a useful tool to explore a
> small dataset via faceted filtering. But it's not a frontend that can be put
> on top of a faceted browsing engine like
> http://developer.nytimes.com/docs/article_search_api
> 

The last thing I want is an argument about this; but surely virtually
every service in the world; faceted browsing included, works by querying
a large dataset to get a smaller set of results, transforming it in to a
the needed format an then displaying? sounds like every system I've ever
seen from the simple html view of an sql query right up to the mighty
google itself.

Maybe I'm being naive here; what am I missing?

Many Regards,

Nathan



RE: Contd: Nice Data Cleansing Tool Demo

2010-03-29 Thread Georgi Kobilarov
Kingsley,

> >>> So by the time you can
> >>> use Pivot on SW/linked data, you will already have solved all the
> >>> interesting and challenging problems.
> >>>
> >> This part is what I call an innovation slot since we have hooked it
> >> into
> >>
> > our
> >
> >> DBMS hosted faceted engine and successfully used it over very large
data
> >> sets.
> >
> > Kingsley, I'm wondering: How did you do that? I tried it myself, and
> > it doesn't work.
> 
> Did I indicate that my demo instance was public? How did you come to
> overlook that?

I wasn't referring to a demo of yours, but to the general task of using
Pivot as a frontend to a faceted browsing backend engine. 


> > Pivot can't make use of server-side faceted browsing engines.
> >
> 
> Why do you speculate? You are incorrect and Virtuoso *doing* what you
> claim is impossible will be emphatic proof, nice and simple.
> 
> Pivot consumes data from HTTP accessible collections (which may be static
or
> dynamic [1]). A dynamic collection is comprised of CXML resources
(basically
> XML) .

I don't speculate. Which parts of my "does not work" and "can't use" did
sound like a speculation?  

 
> > You need to send *all* the data to the Pivot client, and it computes
> > the facets and performs any filtering operation client-side.
> 
> You make a collection from a huge corpus of data (what I demonstrate) then
> you "Save As" (which I demonstrate as the generation point re. CXML
> resource) and then Pivot consumes. All the data is Virtuoso hosted.
> 
> There are two things you are overlooking:
> 
> 1. The dynamic collection is produced at the conclusion of Virtuoso based
> faceted navigation (the interactions basically describes the Facet
> membership to Virtuoso) 2. Pivot works with static and dynamic collections
.
> 
> *I specifically state, this is about using both products together to solve
a
> major problem. #1 Faceted Browsing UX #2 Faceting over a huge data
> corpus.*
> 
> Virtuoso is an HTTP server, it can serve a myriad of representations of
data to
> user agents (it has its own DBMS hosted XSLT Processor and XML Schema
> Validator with XQuery/XPath to boot, all very old stuff).

Yes, you make a collection and "save as" that to CXML, exactly! That is not
"using Pivot as a frontend to Virtuoso". Sure, you can construct a small
dataset from a huge dataset using SPARQL, or your Virtuoso facet engine or
whatever. And then export that resulting dataset to Pivot collection XML and
load that CXML into Pivot. But that is very different to using Pivot as a
frontend to a huge data set. 


> BTW -- how do you think Peter Haase got his variant working? I am sure he
> will shed identical light on the matter for you.

Yes, Peter, please do. From what I saw in the Fluidops demo, it works
exactly as I wrote above: A sparql-query constructs a small dataset from the
sparql endpoint, converts that via a proxy to CXML and loads it into Pivot. 

I don't say Pivot doesn't make a nice demo, or a useful tool to explore a
small dataset via faceted filtering. But it's not a frontend that can be put
on top of a faceted browsing engine like
http://developer.nytimes.com/docs/article_search_api

Georgi

--
Georgi Kobilarov
Uberblic Labs Berlin
http://blog.georgikobilarov.com





Small extension of the RDF Next Steps workshop submission deadline

2010-03-29 Thread Ivan Herman
We have received comments that the 4th April (the current workshop submission 
deadline) for the RDF Next Steps Workshop[1] being Easter Sunday creates 
problems for submitters. To relieve this problem the deadline has been extended 
with a week until the 11th of April.

Ivan


[1] http://www.w3.org/2009/12/rdf-ws/

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf







smime.p7s
Description: S/MIME cryptographic signature


Contd: Nice Data Cleansing Tool Demo

2010-03-29 Thread Kingsley Idehen

Georgi Kobilarov wrote:

Hello,

 

Now here is the obvious question, re. broader realm of faceted data
navigation, have you guys digested the underlying concepts
demonstrated by Microsoft Pivot?



I've seen the TED talk on Pivot. It's a very well polished
implementation of faceted browsing. The Seadragon technology
integration and animations are well executed. As far as "underlying
concepts" in faceted browsing go, I haven't noticed anything novel
  

there.

I agree with David here, nothing novel about the underlying concept. 
One thing I found quite nice and haven't seen before is grouping results
along one facet dimension (the bar-graph representation of results). I 
think

that is a neat idea.
The integration of Seadragon and deep-zooming looks nice, but little more
than that. Not all objects render into nice pictures, and the 
interaction of zooming in
and out isn't a helpful one in my opinion. The zooming gives the 
impression
at first that the position of objects in that 2D space is meaningful, 
but it

is not.  It's an eye-catcher, not more.


 

One thing to note: in each Pivot demo example, there is data of
exactly one type only--say, type people. So it seems, using Microsoft
Pivot, you can't pivot from one type to another, say, from people to
their companies. You can't do that example I used for Parallax: US
presidents -> children -> schools. Or skyscrapers -> architects ->
other buildings. So from what I've seen, as it currently is, Microsoft
Pivot cannot be used for browsing graphs because it cannot pivot (over
graph links).
  

Yes, this is a limitation re. general faceted browsing concepts.



No, it's a limitation of the current implementations of faceted browsing.
Not a general problem with faceted browsing.


 

The most interesting part to me is the use of an alternative symbol
mechanism for the human interaction aspect i.e., deep zoom images where
you would typically see a long human unfriendly URI.



"Where you would typically see URIs"? Really? 


**clean up post re. some critical typos **

Where would you see URIs? What do you see when you use: 
http://lod.openlinksw.com ?


And when you don't see URIs (human or machine, the typical case re. 
Faceted Browsing over RDF) what do you have re. HTTP based Linked Data? 
Zilch!


 

Furthermore, I believe that to get Pivot to perform well, you need a
cleaned up, *homogeneous* data set, presumably of small size (see
their Wikipedia example in which they picked only the top 500 most
visited articles). SW/linked data in their natural habitat, however,
is rarely that cleaned up and homogeneous ...   


Is  that really a problem of Linked Data Web as such? I don't think so.
There is a lot of badly structured, not well cleaned up data on the 
current

Linked Data Web. Because there was so much excitement about publishing
anything in the early day, and so little attention to the actual data 
that's

getting published. That is going to change.
 

So by the time you can
use Pivot on SW/linked data, you will already have solved all the
interesting and challenging problems.
  

This part is what I call an innovation slot since we have hooked it into


our
 

DBMS hosted faceted engine and successfully used it over very large data
sets. 


Kingsley, I'm wondering: How did you do that? I tried it myself, and it
doesn't work.


Did I indicate that my demo instance was public? How did you come to 
overlook that?



Pivot can't make use of server-side faceted browsing engines.
 


Why do you speculate? You are incorrect and Virtuoso *doing* what you 
claim is impossible will be emphatic proof, nice and simple.


Pivot consumes data from HTTP accessible collections (which may be 
static or dynamic [1]). A dynamic collection is comprised of CXML 
resources (basically XML) .



You need to send *all* the data to the Pivot client, and it computes the
facets and performs any filtering operation client-side. 


You make a collection from a huge corpus of data (what I demonstrate) 
then you "Save As" (which I demonstrate as the generation point re. CXML 
resource) and then Pivot consumes. All the data is Virtuoso hosted.


There are two things you are overlooking:

1. The dynamic collection is produced at the conclusion of Virtuoso 
based faceted navigation (the interactions basically describes the Facet 
membership to Virtuoso)

2. Pivot works with static and dynamic collections .

*I specifically state, this is about using both products together to 
solve a major problem. #1 Faceted Browsing UX #2 Faceting over a huge 
data corpus.*


Virtuoso is an HTTP server, it can serve a myriad of representations of 
data to user agents (it has its own DBMS hosted XSLT Processor and XML 
Schema Validator with XQuery/XPath to boot, all very old stuff).



BTW -- how do you think Peter Haase got his variant working? I am sure 
he will shed identical light on the matter for you.


Links:

1. http://www.getpivot.com/developer-info/ 

Re: Preventing SPARQL injection

2010-03-29 Thread Kingsley Idehen

Angelo Veltens wrote:

Hi all,

my name is Angelo Veltens, i'm studying computer science in germany. I
am using the jena framework with sdb for a student research project.

I'm just wondering how to prevent sparql injections. It seems to me,
that i have to build my queries from plain strings and do the sanitizing
on my own. Isn't there something like prepared statements as in
SQL/JDBC? This would be less risky.

Kind regards,
Angelo Veltens



  

The server should have the ability to control who can do what with SPARQL.

If you put SPARQL endpoints behind FOAF+SSL (for instance) and also use 
ACLs at the Graph IRI level, the vulnerability is blocked (bar stealing 
your machine and getting locating your private key).


--

Regards,

Kingsley Idehen	  
President & CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









Re: Nice Data Cleansing Tool Demo

2010-03-29 Thread Kingsley Idehen

Georgi Kobilarov wrote:

Hello,

  

Now here is the obvious question, re. broader realm of faceted data
navigation, have you guys digested the underlying concepts
demonstrated by Microsoft Pivot?



I've seen the TED talk on Pivot. It's a very well polished
implementation of faceted browsing. The Seadragon technology
integration and animations are well executed. As far as "underlying
concepts" in faceted browsing go, I haven't noticed anything novel
  

there.

I agree with David here, nothing novel about the underlying concept. 
One thing I found quite nice and haven't seen before is grouping results

along one facet dimension (the bar-graph representation of results). I think
that is a neat idea. 


The integration of Seadragon and deep-zooming looks nice, but little more
than that. 
Not all objects render into nice pictures, and the interaction of zooming in

and out isn't a helpful one in my opinion. The zooming gives the impression
at first that the position of objects in that 2D space is meaningful, but it
is not.  
It's an eye-catcher, not more.



  

One thing to note: in each Pivot demo example, there is data of
exactly one type only--say, type people. So it seems, using Microsoft
Pivot, you can't pivot from one type to another, say, from people to
their companies. You can't do that example I used for Parallax: US
presidents -> children -> schools. Or skyscrapers -> architects ->
other buildings. So from what I've seen, as it currently is, Microsoft
Pivot cannot be used for browsing graphs because it cannot pivot (over
graph links).
  

Yes, this is a limitation re. general faceted browsing concepts.



No, it's a limitation of the current implementations of faceted browsing.
Not a general problem with faceted browsing.


  

The most interesting part to me is the use of an alternative symbol
mechanism for the human interaction aspect i.e., deep zoom images where
you would typically see a long human unfriendly URI.



"Where you would typically see URIs"? Really? 
  
Where would you see URIs? What do you see when you use: 
http://lod.openlinksw.com ?


And when you don't see URIs (human or machine, the typical case re. 
Faceted Browsing over RDF) what do you have re. HTTP based Linked Data? 
Zilch!


  

Furthermore, I believe that to get Pivot to perform well, you need a
cleaned up, *homogeneous* data set, presumably of small size (see
their Wikipedia example in which they picked only the top 500 most
visited articles). SW/linked data in their natural habitat, however,
is rarely that cleaned up and homogeneous ... 
  


Is  that really a problem of Linked Data Web as such? I don't think so.
There is a lot of badly structured, not well cleaned up data on the current
Linked Data Web. Because there was so much excitement about publishing
anything in the early day, and so little attention to the actual data that's
getting published. That is going to change. 

  

So by the time you can
use Pivot on SW/linked data, you will already have solved all the
interesting and challenging problems.
  

This part is what I call an innovation slot since we have hooked it into


our
  

DBMS hosted faceted engine and successfully used it over very large data
sets. 



Kingsley, I'm wondering: How did you do that? I tried it myself, and it
doesn't work. 
Did I indicate that my demo instance was public? How did you come to 
overlook that?

Pivot can't make use of server-side faceted browsing engines.
  
Why do you speculate? You are incorrect and Virtuoso do what you claim 
is impossible will be emphatic proof, nice and simple.


Pivot consumes data from HTTP accessible collections (which may be 
static or dynamic [1]). A dynamic collection is comprised of CXML 
resources (basically XML) .

You need to send *all* the data to the Pivot client, and it computes the
facets and performs any filtering operation client-side. 


You make a collection from a huge corpus of data (what I demonstrate) 
then you "Save As" (which I demonstrate as the generation point re. CXML 
resource) and then Pivot consumes. All the data is Virtuoso hosted.


There are two things you a overlooking:

1. The dynamic collection is produced at the conclusion of Virtuoso 
based faceted navigation (the interactions basically describes the Facet 
membership to Virtuoso)

2. Pivot works with static and dynamic collections

Virtuoso is an HTTP server, it can serve a myriad of  representations of 
data to user agents (it has its own DBMS hosted XSLT Processor and XML 
Schema Validator with XQuery/XPath to boot, all very old stuff).



BTW -- how do you think Peter Haase got his variant working? I am sure 
he will shed identical light on the matter for you.


Links:

1. http://www.getpivot.com/developer-info/ --- Please note Unbounded 
Dynamic Collections
2. http://www.getpivot.com/developer-info/hosting.aspx#Dynamic -- Look 
at the diagram then revist the architecture of Virtuoso (its a Hybrid 
Data Server that

Re: Nice Data Cleansing Tool Demo

2010-03-29 Thread Aldo Bucchi
Hi David,

I love it and I NEED it ;)
Awesome work, really.

I heard it will be opensource so I will probably be able to extend it
myself, but here are some ideas for (missing?) features:
* Importing custom Lookups/Dictionaries ( to go from text to IDs or
the other way around ). Maybe this is possible using a different hook
for the reconciliation mechanism.
* Related: Plug in other reconciliation services ( not sure how this
stands up to freebase biz alignment )
* Command line engine. To add a GW project as a step in a traditional
transformation job and execute steps sequentially.
* Expose Gazetteers ( dictionaries ) generated within the tool ( when
equating facets )

I have other ideas but I need to try it first it looks like you've
covered a lot of ground here.

Amazing, Amazing. Thanks!
A


On Sun, Mar 28, 2010 at 8:06 PM, David Huynh  wrote:
> On Mar/29/10 12:31 am, Kingsley Idehen wrote:
>
> All,
>
> A very nice data cleansing tool from David and Co. at Freebase.
>
> CSVs are clearly the dominant data format in the structured open data realm.
> This tool deals with ETL very well. Of course, for those who appreciate OWL,
> a lot of what's demonstrated in this demo is also achievable via "context
> rules". Bottom line (imho), nice tool that will only aid improving Web of
> Linked Data quality at the data set production stage.
>
> Links:
>
> 1. http://vimeo.com/10081183 -- Freebase Gridworks
>
> Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also
> demonstrates a few other interesting features:
>
>     http://www.vimeo.com/10287824
>
> David
>



-- 
Aldo Bucchi
skype:aldo.bucchi
http://www.univrz.com/
http://aldobucchi.com/

PRIVILEGED AND CONFIDENTIAL INFORMATION
This message is only for the use of the individual or entity to which it is
addressed and may contain information that is privileged and confidential. If
you are not the intended recipient, please do not distribute or copy this
communication, by e-mail or otherwise. Instead, please notify us immediately by
return e-mail.



Re: Preventing SPARQL injection

2010-03-29 Thread Damian Steer

On 29/03/10 15:53, Rob Vesse wrote:

Forgot to cc to list and to jena-dev


Missed the original post completely. Thanks for ccing to jena-dev.


Hi all,

my name is Angelo Veltens, i'm studying computer science in germany. I
am using the jena framework with sdb for a student research project.

I'm just wondering how to prevent sparql injections. It seems to me,
that i have to build my queries from plain strings and do the sanitizing
on my own. Isn't there something like prepared statements as in
SQL/JDBC? This would be less risky.

Kind regards,
Angelo Veltens


Use the QueryExecutionFactory methods that accept an initial binding: [1]

Query q = QueryFactory.create("select * { ?s ?p ?o }");

QuerySolutionMap qs = new QuerySolutionMap();
qs.add("s", resource); // bind resource to s

QueryExecution qe = QueryExecutionFactory.create(q, dataset, qs);

That's much safer and easier than messing with query strings.

(Unfortunately it doesn't work for remote queries via queryService)

Damian

[1] 





Fwd: Preventing SPARQL injection

2010-03-29 Thread Davide Palmisano
apologize, forgot to cc public-lod

-- Forwarded message --
From: Davide Palmisano 
Date: Mon, Mar 29, 2010 at 4:51 PM
Subject: Re: Preventing SPARQL injection
To: Angelo Veltens 


Hi Angelo,

I'm not sure I well understood your problem. Anyway may be worth give
a look to this: http://clarkparsia.com/weblog/2010/02/03/empire-0-6/

cheers,

Davide

On Sat, Mar 27, 2010 at 1:10 PM, Angelo Veltens
 wrote:
> Hi all,
>
> my name is Angelo Veltens, i'm studying computer science in germany. I
> am using the jena framework with sdb for a student research project.
>
> I'm just wondering how to prevent sparql injections. It seems to me,
> that i have to build my queries from plain strings and do the sanitizing
> on my own. Isn't there something like prepared statements as in
> SQL/JDBC? This would be less risky.
>
> Kind regards,
> Angelo Veltens
>
>
>



--
Davide Palmisano
Technologist at Fondazione Bruno Kessler
http://davidepalmisano.wordpress.com
http://twitter.com/dpalmisano



-- 
Davide Palmisano

http://davidepalmisano.wordpress.com
http://twitter.com/dpalmisano



Re: Preventing SPARQL injection

2010-03-29 Thread Rob Vesse
Forgot to cc to list and to jena-dev

-Original Message-
From: Rob Vesse [mailto:rav...@ecs.soton.ac.uk] 
Sent: 29 March 2010 15:53
To: 'Angelo Veltens'
Subject: RE: Preventing SPARQL injection

The following may be of interest to you:

http://www.slideshare.net/Morelab/sparqlrdqlsparul-injection

They proposed a patch to Jena but I don't know whether it ever got
incorporated into the codebase.

Rob

-Original Message-
From: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] On Behalf
Of Angelo Veltens
Sent: 27 March 2010 12:11
To: public-lod@w3.org
Subject: Preventing SPARQL injection

Hi all,

my name is Angelo Veltens, i'm studying computer science in germany. I
am using the jena framework with sdb for a student research project.

I'm just wondering how to prevent sparql injections. It seems to me,
that i have to build my queries from plain strings and do the sanitizing
on my own. Isn't there something like prepared statements as in
SQL/JDBC? This would be less risky.

Kind regards,
Angelo Veltens






Preventing SPARQL injection

2010-03-29 Thread Angelo Veltens
Hi all,

my name is Angelo Veltens, i'm studying computer science in germany. I
am using the jena framework with sdb for a student research project.

I'm just wondering how to prevent sparql injections. It seems to me,
that i have to build my queries from plain strings and do the sanitizing
on my own. Isn't there something like prepared statements as in
SQL/JDBC? This would be less risky.

Kind regards,
Angelo Veltens




RE: Nice Data Cleansing Tool Demo

2010-03-29 Thread Georgi Kobilarov
Hello,

> >> Now here is the obvious question, re. broader realm of faceted data
> >> navigation, have you guys digested the underlying concepts
> >> demonstrated by Microsoft Pivot?
> >>
> >
> > I've seen the TED talk on Pivot. It's a very well polished
> > implementation of faceted browsing. The Seadragon technology
> > integration and animations are well executed. As far as "underlying
> > concepts" in faceted browsing go, I haven't noticed anything novel
there.

I agree with David here, nothing novel about the underlying concept. 
One thing I found quite nice and haven't seen before is grouping results
along one facet dimension (the bar-graph representation of results). I think
that is a neat idea. 

The integration of Seadragon and deep-zooming looks nice, but little more
than that. 
Not all objects render into nice pictures, and the interaction of zooming in
and out isn't a helpful one in my opinion. The zooming gives the impression
at first that the position of objects in that 2D space is meaningful, but it
is not.  
It's an eye-catcher, not more.


> > One thing to note: in each Pivot demo example, there is data of
> > exactly one type only--say, type people. So it seems, using Microsoft
> > Pivot, you can't pivot from one type to another, say, from people to
> > their companies. You can't do that example I used for Parallax: US
> > presidents -> children -> schools. Or skyscrapers -> architects ->
> > other buildings. So from what I've seen, as it currently is, Microsoft
> > Pivot cannot be used for browsing graphs because it cannot pivot (over
> > graph links).
> Yes, this is a limitation re. general faceted browsing concepts.

No, it's a limitation of the current implementations of faceted browsing.
Not a general problem with faceted browsing.


> The most interesting part to me is the use of an alternative symbol
> mechanism for the human interaction aspect i.e., deep zoom images where
> you would typically see a long human unfriendly URI.

"Where you would typically see URIs"? Really? 


> > Furthermore, I believe that to get Pivot to perform well, you need a
> > cleaned up, *homogeneous* data set, presumably of small size (see
> > their Wikipedia example in which they picked only the top 500 most
> > visited articles). SW/linked data in their natural habitat, however,
> > is rarely that cleaned up and homogeneous ... 

Is  that really a problem of Linked Data Web as such? I don't think so.
There is a lot of badly structured, not well cleaned up data on the current
Linked Data Web. Because there was so much excitement about publishing
anything in the early day, and so little attention to the actual data that's
getting published. That is going to change. 

> > So by the time you can
> > use Pivot on SW/linked data, you will already have solved all the
> > interesting and challenging problems.
> This part is what I call an innovation slot since we have hooked it into
our
> DBMS hosted faceted engine and successfully used it over very large data
> sets. 

Kingsley, I'm wondering: How did you do that? I tried it myself, and it
doesn't work. Pivot can't make use of server-side faceted browsing engines.
You need to send *all* the data to the Pivot client, and it computes the
facets and performs any filtering operation client-side. Works well for up
to around 1k objects, but that's it. Pivot's architecture is in that sense
very much like Exhibit in Silverlight.


Best,
Georgi

--
Georgi Kobilarov
Uberblic Labs Berlin
http://blog.georgikobilarov.com





Re: Nice Data Cleansing Tool Demo

2010-03-29 Thread François Scharffe

Hi David,

Great work !

When will the tool will be released ? I can't wait trying it.

Cheers,
François

David Huynh wrote:

On Mar/29/10 12:31 am, Kingsley Idehen wrote:

All,

A very nice data cleansing tool from David and Co. at Freebase.

CSVs are clearly the dominant data format in the structured open data 
realm. This tool deals with ETL very well. Of course, for those who 
appreciate OWL, a lot of what's demonstrated in this demo is also 
achievable via "context rules". Bottom line (imho), nice tool that 
will only aid improving Web of Linked Data quality at the data set 
production stage.


Links:

1. http://vimeo.com/10081183 -- Freebase Gridworks

Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also 
demonstrates a few other interesting features:


http://www.vimeo.com/10287824

David





Re: Nice Data Cleansing Tool Demo

2010-03-29 Thread Kingsley Idehen

David Huynh wrote:

On Mar/29/10 10:01 am, Kingsley Idehen wrote:

David Huynh wrote:

On Mar/29/10 12:31 am, Kingsley Idehen wrote:

All,

A very nice data cleansing tool from David and Co. at Freebase.

CSVs are clearly the dominant data format in the structured open 
data realm. This tool deals with ETL very well. Of course, for 
those who appreciate OWL, a lot of what's demonstrated in this demo 
is also achievable via "context rules". Bottom line (imho), nice 
tool that will only aid improving Web of Linked Data quality at the 
data set production stage.


Links:

1. http://vimeo.com/10081183 -- Freebase Gridworks

Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also 
demonstrates a few other interesting features:


http://www.vimeo.com/10287824

David

David,

Yes, very nice!

Now here is the obvious question, re. broader realm of faceted data 
navigation, have you guys digested the underlying concepts 
demonstrated by Microsoft Pivot?




I've seen the TED talk on Pivot. It's a very well polished 
implementation of faceted browsing. The Seadragon technology 
integration and animations are well executed. As far as "underlying 
concepts" in faceted browsing go, I haven't noticed anything novel there.


One thing to note: in each Pivot demo example, there is data of 
exactly one type only--say, type people. So it seems, using Microsoft 
Pivot, you can't pivot from one type to another, say, from people to 
their companies. You can't do that example I used for Parallax: US 
presidents -> children -> schools. Or skyscrapers -> architects -> 
other buildings. So from what I've seen, as it currently is, Microsoft 
Pivot cannot be used for browsing graphs because it cannot pivot (over 
graph links).

Yes, this is a limitation re. general faceted browsing concepts.


The most interesting part to me is the use of an alternative symbol 
mechanism for the human interaction aspect i.e., deep zoom images where 
you would typically see a long human unfriendly URI.


Furthermore, I believe that to get Pivot to perform well, you need a 
cleaned up, *homogeneous* data set, presumably of small size (see 
their Wikipedia example in which they picked only the top 500 most 
visited articles). SW/linked data in their natural habitat, however, 
is rarely that cleaned up and homogeneous ... So by the time you can 
use Pivot on SW/linked data, you will already have solved all the 
interesting and challenging problems.
This part is what I call an innovation slot since we have hooked it into 
our DBMS hosted faceted engine and successfully used it over very large 
data sets. Of course it means that we've  implement some internal tweaks 
re. the alternative identifiers symbols, but once that was done, it was 
back to letting our engine do its thing re. huge data set navigation and 
the ability to expose Entity-Attribute-Value graph model based 
hypermedia resources in a variety of data representations (functionality 
that lies at the very core of Virtuoso)  etc..


I do applaud their recent offering of the Pivot widget for embedding 
into any arbitrary site. That should make faceted browsing more 
accessible to web authors, as Exhibit has done. Pivot is way more 
polished and hopefully scales better than Exhibit, although Exhibit is 
more malleable as a piece of software.

Nice assessment :-)

We will soon unveil versions of our live instances (LOD Cloud Cache, 
DBpedia etc..) that work with Pivot as the client via dynamic 
collections. There is a fundamental feature in Virtuoso (what we call 
Anytime Query) that is essential to delivering this functionality. It is 
my hope that via Pivot (for which dynamic collections are extremely 
challenging) we can make comprehension a little clearer. What I describe 
is a general DBMS engine tweak (it goes beyond RDF data management).


Links:

1. http://www.youtube.com/watch?v=G29DBIEcIuQ -- a quick and dirty 
screencast I published post confirmation that our goals had been 
achieved re. huge RDF data sets navigation via Pivot


2. http://bit.ly/9mj7Fw -- old presentation covering our DBMS hosted 
faceted browser engine + Anytime Query feature for handling huge data 
sets at Web scale.



Kingsley



David





--

Regards,

Kingsley Idehen	  
President & CEO 
OpenLink Software 
Web: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 









OPEN POSITION: Move to Berlin, work on DBpedia (1 year full-time contract)

2010-03-29 Thread Chris Bizer
Hi all,

DBpedia [1] is a community effort to extract structured information from
Wikipedia and to make this information available on the Web. DBpedia allows
you to ask sophisticated queries against Wikipedia knowledge [2]. DBpedia
also plays a central role as an interlinking hub in the emerging Web of Data
[3].

The DBpedia Team at Freie Universität Berlin [4] is looking for a
developer/researcher who wants to contribute to the further development of
the DBpedia information extraction framework, investigate approaches to
annotate free-text with DBpedia URIs and participate in the various Linked
Data efforts currently advanced by our team.

Candidates should have 
+ good programming skills in Java, in addition Scala and PHP are helpful. 
+ a university degree preferably in computer science or information systems.


Previous knowledge of Semantic Web Technologies (RDF, SPARQL, Linked Data)
and experience with information extraction and/or named entity recognition
techniques are a plus.

Contract start date: 15 May 2010
Duration: 1 year
Salary: around 40.000 Euro/year (German BAT IIa)

You will be part of an innovative and cordial team and enjoy flexible work
hours. After the year, chances are high that you will be able to choose
between longer-term positions at Freie Universität Berlin and at neofonie
GmbH.

Please contact Chris Bizer via email (ch...@bizer.de) until 15 April 2010
for additional details and include information about your skills and
experience.

The whole DBpedia team is very thankful to neofonie GmbH [5] for
contributing to the development of DBpedia by financing this position.
neofonie is a Berlin-based company offering leading technologies in the area
of Web search, social media and mobile applications.

Cheers,

Chris


[1] http://dbpedia.org/
[2] http://wiki.dbpedia.org/FacetedSearch
[3] http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
[4] http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/
[5] http://www.neofonie.de/

--
Prof. Dr. Christian Bizer
Web-based Systems Group
Freie Universität Berlin
+49 30 838 55509
http://www.bizer.de
ch...@bizer.de




Merging of Concepts

2010-03-29 Thread Neubert Joachim
In knowledge organization systems, sometimes concepts are merged (for
example, because the differences between concepts are not considered
significant any longer). Normally, a "weaker" concept, say A, is bound
to perish and gets some note attatched "Out of use, please use B".

How to put this on to the Linked Data Web?(*) Here our ideas:

 a skos:Concept ;
rdfs:label "Real Estate Loan"@en ;
owl:deprecated "true"^^xsd:boolean ;
dcterms:isReplacedBy  ;
skos:historyNote "deprecated in version 8.06"@en ;
skos:inScheme  .

 a skos:Concept ;
skos:prefLabel "Mortgage"@en ;
skos:altLabel "Real Estate Loan"@en ;
...
skos:inScheme  .

 a skos:ConceptScheme ;
...
owl:versionInfo "8.06" .
 

The main goal was that people/applications who use the concepts of a KOS
(e.g. with dcterms:subject) should be able to automatically discover
deprecated concepts and to replace them by currently valid ones. The
approach does not aim at an machine-readable record of historical states
and state transitions of the KOS and its concepts.

Some design decisions that could be questioned:

1) The former prefLabel of  has been moved to  as an altLabel. As
far as I can see, the spec doesn't prohibit using it also as a prefLabel
for , but it feels more appropriate to avoid this by using a plain
rdfs:label. (As a nice side effect, concept  doesn't show up when the
KOS is queried for prefLabel/altLabel/hiddenLabel)

2) All skos:semanticRelations (broader, narrower, related etc.), and
also all skos:altLabel/hiddenLabels, were removed from the deprecated
concept A and, where appropriate, attached to concept B. 

3) A deprecated skos:Concept still is a skos:Concept (may be in use as
such; especially, may have still mappings from other vocabularies,
outside the scope of the actual KOS)

4) Versioning is done in a very informal way by attaching
owl:versionInfo to the skos:ConceptScheme. The skos:Concepts are
deliberately not versioned, because this would put a heavy burden on
their use. Also, the skos:ConceptScheme  currently does not have a
"versioned" URI (which could be done, but I'm not sure if it would be
worth the increased complexity). The history is only represented
informally in a skos:historyNote.

5) Prior versions of the skos:ConcpetScheme are on the web for
reference, however, the URIs for concepts and the scheme itself resolve
to the current version. (We will try to prohibit indexing of prior
versions by semantic search engines in order to avoid the aggregation of
not longer valid attributes and relations for a concept)

I wonder what you think about this approach.

Cheers, Joachim

* Some work to build on has been collected at
http://www.w3.org/2001/sw/wiki/SKOS/Issues/ConceptEvolution