Re: Are there any datasets about companies? ( DBpedia Open Data Initiative)

2015-11-05 Thread Sebastian Hellmann

Hi Chris,

However, making sense of this data is very, very time consuming, not 
to mentioned writing and maintaining bots  (we now have hundreds and 
hundreds of them) to scrape jurisdictions that aren't open data (the 
vast majority) takes significant resources, and we don't see any way 
of sustaining this on a CC-BY licence. 

a) is the code for these bots somewhere?
b) we hope to find a way to maintain it this time. DBpedia has received 
funding via http://smartdataweb.de/ and also http://aligned-project.eu/
We are also currently  working on a charter for an non-profit 
association that is committed to keep all data open under cc-by (we are 
accepting donations, membership fees among other things)


I could also write a book about corporate identifiers, and the issues 
with those on the list (but don't have time).

We are writing such a book* in parallel, do you want to help?
Sebastian

*= well it's just a paper


On 05.11.2015 19:18, Chris Taggart wrote:

Rolf etc

Thanks for cc'ing me. We'd had contact from Sebastian and given him an 
API key. The main issues here are sustainability and domain knowledge. 
We'd love more people to be downloading the open datasets from the UK 
and others, and using them in all sorts of innovative ways, and the 
main reason we do the Open Company Data Index 
, is to motivate company 
registers to opening up their data (I was speaking at the Open Govt 
Partnership Summit in Mexico City last week on the same subject). 
However, making sense of this data is very, very time consuming, not 
to mentioned writing and maintaining bots  (we now have hundreds and 
hundreds of them) to scrape jurisdictions that aren't open data (the 
vast majority) takes significant resources, and we don't see any way 
of sustaining this on a CC-BY licence.


Finally, there are very few registers that are CC-BY licences or less 
(for example Denmark places restrictions on use for marketing), even 
ignoring DPA issues (we are now spending a considerable amount on 
legal fees on this issue). I could also write a book about corporate 
identifiers, and the issues with those on the list (but don't have time).


So, we'd love to see more activity in the area, particularly in 
Germany – where the Handelsregister and Bundesanzeiger are very 
definitely not open data  ;-)


Chris

On 5 November 2015 at 12:49, Rolf Kleef > wrote:


Hi Sebastian, Kay,

If you haven't done it yet, I suggest getting in touch with Chris
Taggart of Open Corporates (cc'd). He has years of experience doing
this, and is also involved in cross-standards work on "organisational
identifiers", crucial in the development of for instance the Open
Contracting Data Standard and the International Aid Transparancy
Initiative:

http://www.open-contracting.org/
http://iatistandard.org/201/organisation-identifiers/

~~Rolf.

On 03/11/15 16:17, Sebastian Hellmann wrote:
> [Apologies for cross-posting]
>
> Dear all,
> this message is part announcement of an open data initiative and
part
> call for feedback and support.
>
> We are considering to work on creating a free, open and
interoperable
> dataset on companies and organisations, which we are planing to
> integrate into DBpedia+ and offer as dump download. As we are in
a very
> early phase of the endeavour, we would like to know whether there is
> existing work in this area.
>
> We are looking for any available datasets which have information
about
> companies and other organizations in any language and any country.
> Ideally, the datasets are:
> 1. downloadable as dump
> 2. openly licensed , e.g. CC-BY following the
http://opendefinition.org/
> 3. in an easily parseable format, e.g. RDF or CSV and not PDF
>
> But hey! Send around anything you know, and we will look at it
and see
> whether we can make use of it. You can reach us either by
replying  to
> this email or send feedback directly to me and Kay Müller
> mailto:kay.muel...@informatik.uni-leipzig.de>>.
> If you have any private/closed data, please contact us as well.
We might
> make use of it to cross-reference and validate public/open data
with it.
> Or just learn from it to build a good scheme.
>
> We started a link collection here (and attached the current
status at
> the end of this email)
>

https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit
> Also we started to collect potential identifiers for linking here:
>

https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0
>
> Regards and thank you for any support on this,
> Sebastian and Kay
>
> ##
>
>

https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q

Re: CfP: WWW2016 workshop on Linked Data on the Web (LDOW2016)

2015-11-05 Thread Hugh Glaser
Many thanks Chris, very helpful information, and very quickly.
And good news too!

> On 3 Nov 2015, at 15:45, Christian Bizer  wrote:
> 
> Hi Hugh,
> 
>> Hi Chris et al,
>> Great stuff.
>> Can you tell me please if it will be possible to register for the workshop 
>> on its own, or will a registration for the full WWW be required to register 
>> for the workshop?
> 
> The WWW2016 workshop track chairs just confirmed that it will be possible to 
> register again for the workshop days (not a specific workshop) similar to the 
> arrangement last year.
> 
> The concrete prices seem not to be set yet, but last year the fees were 410 
> Euro just for the workshop days compared to 850 Euro for the full pass.
> 
> See http://www.www2015.it/registrations/
> 
> Cheers and hope to see you in Montreal,
> 
> Chris
> 
>> 
>>> On 2 Nov 2015, at 09:06, Christian Bizer  wrote:
>>> 
>>> Hi all,
>>> 
>>> Sören Auer, Tim Berners-Lee, Tom Heath, and I are organizing the 9th edition
>>> of the Linked Data on the Web workshop at WWW2016 in Montreal, Canada. The
>>> paper submission deadline for the workshop is  24 January, 2016. Please find
>>> the call for papers below.
>>> 
>>> We are looking forward to having another exciting workshop and to seeing
>>> many of you in Montreal.
>>> 
>>> Cheers,
>>> 
>>> Chris, Tim, Sören, and Tom
>>> 
>>> 
>>> 
>>> 
>>> Call for Papers: 9th Workshop on Linked Data on the Web (LDOW2016)
>>> 
>>> 
>>>Co-located with 25th International World Wide Web Conference
>>>   April 11 to 15, 2016 in Montreal, Canada
>>> 
>>> 
>>>  http://events.linkeddata.org/ldow2016/
>>> 
>>> 
>>> 
>>> The Web is developing from a medium for publishing textual documents into a
>>> medium for sharing structured data. This trend is fueled on the one hand by
>>> the adoption of the Linked Data principles by a growing number of data
>>> providers. On the other hand, large numbers of websites have started to
>>> semantically mark up the content of their HTML pages and thus also
>>> contribute to the wealth of structured data available on the Web.
>>> 
>>> The 9th Workshop on Linked Data on the Web (LDOW2016) aims to stimulate
>>> discussion and further research into the challenges of publishing,
>>> consuming, and integrating structured data from the Web as well as mining
>>> knowledge from the global Web of Data. The special focus of this year’s LDOW
>>> workshop will be Web Data Quality Assessment and Web Data Cleansing.
>>> 
>>> 
>>> *Important Dates*
>>> 
>>> * Submission deadline: 24 January, 2016 (23:59 Pacific Time)
>>> * Notification of acceptance: 10 February, 2016
>>> * Camera-ready versions of accepted papers: 1 March, 2016
>>> * Workshop date: 11-13 April, 2016
>>> 
>>> 
>>> *Topics of Interest*
>>> 
>>> Topics of interest for the workshop include, but are not limited to, the
>>> following:
>>> 
>>> Web Data Quality Assessment
>>> * methods for evaluating the quality and trustworthiness of web data
>>> * tracking the provenance of web data
>>> * profiling and change tracking of web data sources
>>> * cost and benefits of web data quality assessment
>>> * web data quality assessment benchmarks
>>> 
>>> Web Data Cleansing
>>> * methods for cleansing web data
>>> * data fusion and truth discovery
>>> * conflict resolution using semantic knowledge
>>> * human-in-the-loop and crowdsourcing for data cleansing
>>> * cost and benefits of web data cleansing
>>> * web data quality cleansing benchmarks
>>> 
>>> Integrating Web Data from Large Numbers of Data Sources
>>> * linking algorithms and heuristics, identity resolution
>>> * schema matching and clustering
>>> * evaluation of linking and schema matching methods
>>> 
>>> Mining the Web of Data
>>> * large-scale derivation of implicit knowledge from the Web of Data
>>> * using the Web of Data as background knowledge in data mining
>>> * techniques and methodologies for Linked Data mining and analytics
>>> 
>>> Linked Data Applications
>>> * application showcases including Web data browsers and search engines
>>> * marketplaces, aggregators and indexes for Web Data
>>> * security, access control, and licensing issues of Linked Data
>>> * role of Linked Data within enterprise applications (e.g. ERP, SCM, CRM)
>>> * Linked Data applications for life-sciences, digital humanities, social
>>> sciences etc.
>>> 
>>> 
>>> *Submissions*
>>> 
>>> We seek two kinds of submissions:
>>> 
>>> 1. Full scientific papers: up to 10 pages in ACM format
>>> 2. Short scientific and position papers: up to 5 pages in ACM format
>>> 
>>> Submissions must be formatted using the ACM SIG template available at
>>> http://www.acm.org/sigs/publications/proceedings-templates. Accepted papers
>>> will be presented at the workshop and included in the CEUR workshop
>>> proceedings. At least one author of each paper has to register for the
>>> workshop and to present the paper.
>>> 
>>> 
>>> *Organizing Committee*
>>> 
>>> Christian Bizer, University of Mannheim, Germany
>>> T

Re: New LOD dataset for media types

2015-11-05 Thread Graham Klyne


On 05/11/2015 16:35, Silvio Peroni wrote:

>Perhaps establish something likehttps://w3id.org/sparontologies/mediatype/  
  and redirect for your server. I 
believe you could have stronger adoption and use of your ontology if you adopt a good 
IRI design and persistence strategy up front.

That’s a good point, and I’ve just finished to update everything, since it was 
possible in a relatively short time:-)

In particular, I’ve just made a request to w3id.org  for a 
space, and I’ve updated le URLs of all the entities. Now they are accessible using the



Hi Silvio,

Responding to your earlier question, this kind of community-underwritten hosting 
is the sort of thing I was contemplating.


Thanks!

#g
--



Re: Are there any datasets about companies? ( DBpedia Open Data Initiative)

2015-11-05 Thread Kingsley Idehen
On 11/5/15 9:31 AM, brian.uli...@thomsonreuters.com wrote:
> I have uploaded my paper at the METHOD 2015 Workshop at this year's
> ISWC here:
>
> http://www.researchgate.net/publication/283500696_Constructing_Knowledge_Graphs_with_Trust#share
>
> It explains the rationale behind Thomson Reuters'  company open permid
> URIs and the advantages they have over rival identifier schemes like
> DUNS numbers, company websites, DBpedia URIs, etc.
>
> Brian Ulicny, PhD
> Director, Data Science
> Data Innovation Lab
> Thomson Reuters
> 22 Thomson Pl
> Boston, MA 02210

Brian,

When will there be any combination of the following, from this data space:

[1] SPARQL Endpoint
[2] Data Dump in one or more of the standard RDF document formats?

Working through an API is too constraining, hence the need for the
additional flexibility provided by the items above.


Kingsley
>
> 
> *From:* Ulicny, Brian (TR Technology)
> *Sent:* Wednesday, November 04, 2015 10:03 AM
> *To:* hellm...@informatik.uni-leipzig.de;
> kay.muel...@informatik.uni-leipzig.de; public-lod@w3.org
> *Subject:* RE: Are there any datasets about companies? ( DBpedia Open
> Data Initiative)
>
> Hi Sebastian,
>
> Thomson Reuters offers a bulk download and API for company identifiers
> at the level of legal entities here:
>
> https://permid.org
>
> The PermID Service lets you utilize the permanent ID of 3.5 million
> organizations, 240K equity instruments and 1.17 million equity quotes
> from the Thomson Reuters core entity data set. PermID Service provides
> access to Thomson Reuters permanent identifiers (permanent unique IDs
> formatted as Uniform Resource Identifiers
> )
> along with the associated descriptive fields that Thomson Reuters
> exposes to the public. 
>
> (Accessible by appending ?format=turtle to the URI:
> e.g. https://permid.org/1-4295861160?format=turtle
> 
>  returns
> triples about the status and location of Thomson Reuters Corp as
> triples.)  Unfortunately, there is not currently a SPARQL endpoint for
> this information.
>
> The descriptive fields enable the user to verify that a consumed
> permID represents the entity of interest.
>
> The data is live; records are updated every 15 minutes.
>
> In the future, PermID Service will support additional entities such as
> People, Fixed Income Instruments, Fixed Income Quote, and more.
>
> The identifiers are also produced as output in the Open Calais Tagger,
> with an API and interface at the same site.  That is, if a string is
> identified in free text as denoting a company with an open permid, the
> open permid URI is returned as part of the RDF output.
>
> Bulk Files Content and Format
>
> The Open PermID database is licensed under the Creative Commons with
> Attribution license, version 4.0 (CC-BY).
>
> An extended set of fields is also available under the Creative Commons
> Non-Commercial license (CC-NC 3.0). 
>
> Plain 
>
> language summaries of these licenses are available on the Creative
> Commons website.
>
> Supported Formats: Turtle and NTriples (See Appendix F for an example
> in ttl format.)
> Coverage: The same records provided by the Entity Search API
> (see terms of use
> ).
>
> Frequency: New bulk files will be published once a week.
> Incremental Updates: Once the bulk files are consumed, the subsequent
> incremental updates can be consumed via our Atom Feed.
>
> Let me know if you have any questions.
>
> The files are available for download via the Open PermID
> 
>  website:
>
> Best regards,
>
> Brian Ulicny, PhD
> Director, Data Science
> Data Innovation Lab
> Thomson Reuters Corp.
> 22 Thomson Pl
> Boston, MA 02210
> 
> *From:* Sebastian Hellmann [hellm...@informatik.uni-leipzig.de]
> *Sent:* Tuesday, November 03, 2015 9:17 AM
>

Re: New LOD dataset for media types

2015-11-05 Thread Silvio Peroni
Hi Jason,

> Thanks for this work on ontologies for media types. Is there any overlap here 
> with https://schema.org/MediaObject  ? 

Well, it’s not an overlap, is more a complement. The media types described in 
such dataset (class dcterms:MediaType) are possible formats that media objects 
may have. This can be specified by means of dcterms:format property, for 
instance:

:my-video a schema:MediaObject ;
dcterms:format  .

> BTW, the W3C has a really nice process in place for IRI persistence.

I know it and I’ve already used it indeed for another project.

> Perhaps establish something like https://w3id.org/sparontologies/mediatype/ 
>  and redirect for your server. I 
> believe you could have stronger adoption and use of your ontology if you 
> adopt a good IRI design and persistence strategy up front.

That’s a good point, and I’ve just finished to update everything, since it was 
possible in a relatively short time :-) 

In particular, I’ve just made a request to w3id.org  for a 
space, and I’ve updated le URLs of all the entities. Now they are accessible 
using the

https://w3id.org/spar/mediatype/[mediatype] 
 

e.g.

https://w3id.org/spar/mediatype/text/turtle 


Does it sound good?

Jason, Graham, thanks again for all your suggestions and support!
Have a nice day :-)

S.




Silvio Peroni, Ph.D.
Department of Computer Science and Engineering
University of Bologna, Bologna (Italy)
Tel: +39 051 2094871
E-mail: silvio.per...@unibo.it
Web: http://www.essepuntato.it
Twitter: essepuntato



Re: New LOD dataset for media types

2015-11-05 Thread Haag, Jason
FYI: The process for using the W3C's permanent identifier/PURL service is
here in the readme: https://github.com/perma-id/w3id.org

As Jeff mentioned PURL.org still works in read-only mode, you just can't
login and edit or add new PURLS right now.

---
Advanced Distributed Learning Initiative
+1.850.266.7100(office)
+1.850.471.1300 (mobile)
jhaag75 (skype)
http://linkedin.com/in/jasonhaag

On Thu, Nov 5, 2015 at 9:42 AM, Daniel Garijo  wrote:

>
>
> 2015-11-05 16:22 GMT+01:00 Michael Brunnbauer :
>
>>
>> Hello Jason,
>>
>> On Thu, Nov 05, 2015 at 08:32:22AM -0600, Haag, Jason wrote:
>> > PURL.org is current disabled and not allowing even existing users
>> > to login and manage their PURLs.
>>
>> Does anybody have an idea how long the login on purl.org has been
>> disabled?
>>
> Around a week (since 27 October).
> This is the response provided in the public mailing list for persistent
> urls :
> "Sorry for the inconvenience. The SOLR index on the purl.org site has
> stopped updating, which prevents effective maintenance. The login mechanism
> is turned off until a solution is discovered. The timeframe for that isn't
> clear yet."
> Best,
> Daniel
>
>
>>
>> Should users of purl.org URIs start to think about alternatives?
>>
>> Regards,
>>
>> Michael Brunnbauer
>>
>> --
>> ++  Michael Brunnbauer
>> ++  netEstate GmbH
>> ++  Geisenhausener Straße 11a
>> ++  81379 München
>> ++  Tel +49 89 32 19 77 80
>> ++  Fax +49 89 32 19 77 89
>> ++  E-Mail bru...@netestate.de
>> ++  http://www.netestate.de/
>> ++
>> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
>> ++  USt-IdNr. DE221033342
>> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
>> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>>
>
>


Re: New LOD dataset for media types

2015-11-05 Thread Daniel Garijo
2015-11-05 16:22 GMT+01:00 Michael Brunnbauer :

>
> Hello Jason,
>
> On Thu, Nov 05, 2015 at 08:32:22AM -0600, Haag, Jason wrote:
> > PURL.org is current disabled and not allowing even existing users
> > to login and manage their PURLs.
>
> Does anybody have an idea how long the login on purl.org has been
> disabled?
>
Around a week (since 27 October).
This is the response provided in the public mailing list for persistent urls
:
"Sorry for the inconvenience. The SOLR index on the purl.org site has
stopped updating, which prevents effective maintenance. The login mechanism
is turned off until a solution is discovered. The timeframe for that isn't
clear yet."
Best,
Daniel


>
> Should users of purl.org URIs start to think about alternatives?
>
> Regards,
>
> Michael Brunnbauer
>
> --
> ++  Michael Brunnbauer
> ++  netEstate GmbH
> ++  Geisenhausener Straße 11a
> ++  81379 München
> ++  Tel +49 89 32 19 77 80
> ++  Fax +49 89 32 19 77 89
> ++  E-Mail bru...@netestate.de
> ++  http://www.netestate.de/
> ++
> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
> ++  USt-IdNr. DE221033342
> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>


RE: New LOD dataset for media types

2015-11-05 Thread Young,Jeff (OR)
The SOLR indexing on purl.org has been misbehaving for a couple of months. 
Login has been disabled for a few weeks to minimize the damage. The service is 
still running, but in read-only mode. There is no estimate for when this will 
be resolved.

One alternative is to use the http://w3id.org/ service, which is run by the W3C 
Permanent Identifier Community Group. Their mechanism is simpler and has a 
broader base of support.

Jeff

> -Original Message-
> From: Michael Brunnbauer [mailto:bru...@netestate.de]
> Sent: Thursday, November 05, 2015 10:22 AM
> To: Haag, Jason
> Cc: Linking Open Data Mailing List Data; Semantic Web Mailing List
> Subject: Re: New LOD dataset for media types
> 
> 
> Hello Jason,
> 
> On Thu, Nov 05, 2015 at 08:32:22AM -0600, Haag, Jason wrote:
> > PURL.org is current disabled and not allowing even existing users to
> > login and manage their PURLs.
> 
> Does anybody have an idea how long the login on purl.org has been disabled?
> 
> Should users of purl.org URIs start to think about alternatives?
> 
> Regards,
> 
> Michael Brunnbauer
> 
> --
> ++  Michael Brunnbauer
> ++  netEstate GmbH
> ++  Geisenhausener Straße 11a
> ++  81379 München
> ++  Tel +49 89 32 19 77 80
> ++  Fax +49 89 32 19 77 89
> ++  E-Mail bru...@netestate.de
> ++  http://www.netestate.de/
> ++
> ++  Sitz: München, HRB Nr.142452 (Handelsregister B München)  USt-IdNr.
> ++ DE221033342
> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel



Re: New LOD dataset for media types

2015-11-05 Thread Michael Brunnbauer

Hello Jason,

On Thu, Nov 05, 2015 at 08:32:22AM -0600, Haag, Jason wrote:
> PURL.org is current disabled and not allowing even existing users
> to login and manage their PURLs.

Does anybody have an idea how long the login on purl.org has been disabled?

Should users of purl.org URIs start to think about alternatives?

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel


signature.asc
Description: PGP signature


Re: New LOD dataset for media types

2015-11-05 Thread Haag, Jason
Silvio,

Thanks for this work on ontologies for media types. Is there any overlap
here with https://schema.org/MediaObject ?

BTW, the W3C has a really nice process in place for IRI persistence.
Perhaps establish something like https://w3id.org/sparontologies/mediatype/
and redirect for your server. I believe you could have stronger adoption
and use of your ontology if you adopt a good IRI design and persistence
strategy up front.

There are several nice things about the W3C permanent identifier approach.
For example:

1) longevity of the IRI (if sparontologies.net ceased to exist for whatever
reason, the ontologies could be maintained elsewhere by updating the w3id
target URL/redirect.

2) You don't have to worry about configuring a server for content
negotiation. The w3id.org server is already set up to handle content neg
using apache.

3) It uses github to sync with the w3id.org server and has a strong
commitment from several folks from the W3C and other organizations heavily
invested in the web. You simply add your .htaccess files and submit a pull
request. PURL.org is current disabled and not allowing even existing users
to login and manage their PURLs. The interface and approach for PURL.org is
cumbersome.


Regards,

J Haag




---
Advanced Distributed Learning Initiative
+1.850.266.7100(office)
+1.850.471.1300 (mobile)
jhaag75 (skype)
http://linkedin.com/in/jasonhaag

On Thu, Nov 5, 2015 at 7:37 AM, Silvio Peroni 
wrote:

> Hi Graham,
>
> Nice work: this could be handy.
>
>
> Thanks :-)
>
> 1. I assume the machine formats are accessible by content negotiation?
>
>
> They are accessible by content negotiation indeed. In particular, one can
> use:
> - "application/rdf+xml" for having the data in RDF/XML;
> - "text/turtle” or "text/n3" for having the data in Turtle;
> - "application/ld+json" or "application/json" for having the data in
> JSON-LD;
> - any other kind of request will result in having the data in HTML.
>
> E.g.:
>
> > curl -L -H "Accept: application/rdf+xml"
> http://www.sparontologies.net/mediatype/text/plain
>
> will return the RDF/XML representation of such media type.
>
> It might be handy to also include direct links on the web page.  For
> example, I found I could change the .html suffix to .ttl or .json to get
> alternative representations, but that required guesswork on my part.
>
>
> That’s right, you can access directly to the related formats by adding
> .rdf, .ttl, .json and .html respectively. I’ll add a note in the homepage
> of the dataset and in each HTML page of each mime type soon.
>
> 2. Are there any plans (and resources) in place to ensure longevity of the
> sparontologies.net domain?
>
>
> Well, actually now the sparontologies.net website is handled by me, with
> personal resources. I would like to keep it available without external
> funds, but in case it will be more expensive to maintain I think I can ask
> for support to the University of Bologna, that’s the plan at least. Do you
> have something different in mind?
>
> 3. I note that the HTML page indicates cc-by licensing of the content, but
> there is no such information in the machine readable formats.
>
>
> That’s right, I have to add these data into the RDF representation as well
> – in my todo list.
>
> Have a nice day :-)
>
> S.
>
>
>
>
>
> 
> Silvio Peroni, Ph.D.
> Department of Computer Science and Engineering
> University of Bologna, Bologna (Italy)
> Tel: +39 051 2094871
> E-mail: silvio.per...@unibo.it
> Web: http://www.essepuntato.it
> Twitter: essepuntato
>
>


Re: New LOD dataset for media types

2015-11-05 Thread Silvio Peroni
Hi Graham,

> Nice work: this could be handy.  

Thanks :-)

> 1. I assume the machine formats are accessible by content negotiation?  

They are accessible by content negotiation indeed. In particular, one can use:
- "application/rdf+xml" for having the data in RDF/XML;
- "text/turtle” or "text/n3" for having the data in Turtle;
- "application/ld+json" or "application/json" for having the data in JSON-LD;
- any other kind of request will result in having the data in HTML.

E.g.:

> curl -L -H "Accept: application/rdf+xml" 
> http://www.sparontologies.net/mediatype/text/plain 
> 

will return the RDF/XML representation of such media type.

> It might be handy to also include direct links on the web page.  For example, 
> I found I could change the .html suffix to .ttl or .json to get alternative 
> representations, but that required guesswork on my part.

That’s right, you can access directly to the related formats by adding .rdf, 
.ttl, .json and .html respectively. I’ll add a note in the homepage of the 
dataset and in each HTML page of each mime type soon.

> 2. Are there any plans (and resources) in place to ensure longevity of the 
> sparontologies.net domain?

Well, actually now the sparontologies.net  website 
is handled by me, with personal resources. I would like to keep it available 
without external funds, but in case it will be more expensive to maintain I 
think I can ask for support to the University of Bologna, that’s the plan at 
least. Do you have something different in mind?

> 3. I note that the HTML page indicates cc-by licensing of the content, but 
> there is no such information in the machine readable formats.

That’s right, I have to add these data into the RDF representation as well – in 
my todo list.

Have a nice day :-)

S.





Silvio Peroni, Ph.D.
Department of Computer Science and Engineering
University of Bologna, Bologna (Italy)
Tel: +39 051 2094871
E-mail: silvio.per...@unibo.it
Web: http://www.essepuntato.it
Twitter: essepuntato



Re: New LOD dataset for media types

2015-11-05 Thread Graham Klyne

Hi Silvio,

Nice work: this could be handy.  I have a couple of questions/comments:

1. I assume the machine formats are accessible by content negotiation?  It might 
be handy to also include direct links on the web page.  For example, I found I 
could change the .html suffix to .ttl or .json to get alternative 
representations, but that required guesswork on my part.


2. Are there any plans (and resources) in place to ensure longevity of the 
sparontologies.net domain?


3. I note that the HTML page indicates cc-by licensing of the content, but there 
is no such information in the machine readable formats.


#g
--


On 04/11/2015 23:21, Silvio Peroni wrote:

Dear friends,

We are pleased to announce a new LOD dataset

http://www.sparontologies.net/mediatype

that makes available media types defined as proper resources in RDF, according 
to the SPAR Ontologies [1] and DCTerms [2].

A media type is an identifier (for example "text/turtle") for file formats on the Internet composed 
by two parts: a registry ("text" in the example) and a record ("turtle" in the example). 
They are handled by the Internet Assigned Numbers Authority (IANA), which is the official authority for the 
standardisation and publication of these classifications.

The aforementioned web space, part of the SPAR Ontologies website, has been reserved for providing the RDF 
representation of each media type defined by IANA [3]. In particular, a media type is accessible by 
concatenating the URL "http://www.sparontologies.net/mediatype/"; with its related identifier. For 
instance, "http://www.sparontologies.net/mediatype/text/turtle"; allows one to access the specific 
resource for the "text/turtle" media type.

Each media type can be accompanied by agents who acted as contributors, the 
related RFC documents documenting it, its current status (either official, 
deprecated or obsoleted), and direct links to Wikipedia pages and DBpedia 
resources related with the media type.

All these resources defining media types, thus, can be used for specifying 
particular formats (e.g., by means of the DCTerms property dcterms:format) that 
a certain entity, such as a book or a dataset, can have.

Please do not hesitate to contact us (sparontolog...@gmail.com) for questions 
and additional information about this LOD dataset.

Have a nice day :-)

S.



1. http://www.sparontologies.net
2. http://dublincore.org/documents/dcmi-terms/
3. http://www.iana.org/assignments/media-types/media-types.xml



Silvio Peroni, Ph.D.
Department of Computer Science and Engineering
University of Bologna, Bologna (Italy)
Tel: +39 051 2094871
E-mail: silvio.per...@unibo.it
Web: http://www.essepuntato.it
Twitter: essepuntato






Re: Are there any datasets about companies? ( DBpedia Open Data Initiative)

2015-11-05 Thread Jerven Bolleman
I just saw this http://api.opencorporates.com/ might be interesting
via
http://www.theguardian.com/odine-partner-zone/2015/nov/04/winners-second-call-odine-call-open-data-incubator-programme-europe

On Wed, Nov 4, 2015 at 5:03 PM,  wrote:

> Hi Sebastian,
>
> Thomson Reuters offers a bulk download and API for company identifiers at
> the level of legal entities here:
>
> https://permid.org
>
> The PermID Service lets you utilize the permanent ID of 3.5 million
> organizations, 240K equity instruments and 1.17 million equity quotes from
> the Thomson Reuters core entity data set. PermID Service provides access to
> Thomson Reuters permanent identifiers (permanent unique IDs formatted as 
> Uniform
> Resource Identifiers
> ) along with
> the associated descriptive fields that Thomson Reuters exposes to the
> public.
>
> (Accessible by appending ?format=turtle to the URI: e.g.
> https://permid.org/1-4295861160?format=turtle returns triples about the
> status and location of Thomson Reuters Corp as triples.)  Unfortunately,
> there is not currently a SPARQL endpoint for this information.
>
> The descriptive fields enable the user to verify that a consumed permID
> represents the entity of interest.
>
> The data is live; records are updated every 15 minutes.
>
> In the future, PermID Service will support additional entities such as
> People, Fixed Income Instruments, Fixed Income Quote, and more.
>
> The identifiers are also produced as output in the Open Calais Tagger,
> with an API and interface at the same site.  That is, if a string is
> identified in free text as denoting a company with an open permid, the open
> permid URI is returned as part of the RDF output.
> Bulk Files Content and Format
>
> The Open PermID database is licensed under the Creative Commons with
> Attribution license, version 4.0 (CC-BY).
>
> An extended set of fields is also available under the Creative Commons
> Non-Commercial license (CC-NC 3.0).
>
> Plain
>
> language summaries of these licenses are available on the Creative Commons
> website.
> Supported Formats: Turtle and NTriples (See
> Appendix F for an example in ttl format.)
> Coverage: The same records provided by the Entity Search API (see terms
> of use ).
>
> Frequency: New bulk files will be published once a week.
> Incremental Updates: Once the bulk files are consumed, the subsequent
> incremental updates can be consumed via our Atom Feed.
>
> Let me know if you have any questions.
>
> The files are available for download via the Open PermID
>  website:
>
> Best regards,
>
> Brian Ulicny, PhD
> Director, Data Science
> Data Innovation Lab
> Thomson Reuters Corp.
> 22 Thomson Pl
> Boston, MA 02210
> --
> *From:* Sebastian Hellmann [hellm...@informatik.uni-leipzig.de]
> *Sent:* Tuesday, November 03, 2015 9:17 AM
> *To:* public-lod
> *Subject:* Are there any datasets about companies? ( DBpedia Open Data
> Initiative)
>
> [Apologies for cross-posting]
>
> Dear all,
> this message is part announcement of an open data initiative and part call
> for feedback and support.
>
> We are considering to work on creating a free, open and interoperable
> dataset on companies and organisations, which we are planing to integrate
> into DBpedia+ and offer as dump download. As we are in a very early phase
> of the endeavour, we would like to know whether there is existing work in
> this area.
>
> We are looking for any available datasets which have information about
> companies and other organizations in any language and any country. Ideally,
> the datasets are:
> 1. downloadable as dump
> 2. openly licensed , e.g. CC-BY following the
> 
> http://opendefinition.org/
> 3. in an easily parseable format, e.g. RDF or CSV and not PDF
>
> But hey! Send around anything you know, and we will look at it and see
> whether we can make use of it. You can reach us either by replying  to this
> email or send feedback directly to me and Kay Müller
> 
> .
> If you have any private/closed data, please contact us as well. We might
> make use of it to cross-reference and validate public/open data with it. Or
> just learn from it to build a good scheme.
>
> We started a link collection here (and attached the current status at the
> end of this email)
>
> https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit
>