Re: Discogs Linked Data

2010-06-03 Thread mats . gls
The main major thing lacking now I think is links[1] to MusicBrainz, and I
don't think that can be done without a dump. Apart from that, the mappings
are also incomplete. The ruby code can be found at dataincubator[2].

1.
http://blog.dbtune.org/post/2007/06/11/Linking-open-data%3A-interlinking-the-Jamendo-and-the-Musicbrainz-datasets
2. http://code.google.com/p/dataincubator/source/browse/trunk/#trunk/discogs

Cheers,

Mats

On Thu, Jun 3, 2010 at 10:45 PM, Kurt J kur...@gmail.com wrote:

 Hello,

  Does anyone know the state of play wrt a linked dataset describing
 Discogs
  (the music/record site)?

 I've spent some time w/ Discogs stuff - it needs some work.  The links
 to DBpedia are broken b/c of some capitalization errors, and the
 artist URIs and foaf:names are a bit borked b/c the underlying data
 has some unicode errors (two bytes v one byte unicode not handled
 properly)

  There have always been Virtuoso Sponger [1] Cartridges (Basic and Meta)
 for
  Discogs.
 
  Examples:
 
  1.
 
 http://linkeddata.uriburner.com/about/id/entity/http/www.discogs.com/artist/Stevie+Wonder
  -- Stevie Wonder
 
  2.  http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger--
  Virtuoso Sponger Middleware
 

 you might see if you can just use these.  otherwise the ruby code is
 around somewhere on the talis platform site.  no time to find it now -
 i've got a new born to look after :-)

 -kurt j




Re: Discogs Linked Data

2010-06-03 Thread mats . gls

 this is a data set i really want too  somebody know a way around
 the unicode problem???

 Maybe find stuff like these #195;#175; with a regexp and then replace
them with the correct unicode chars.

In Python something like this looped through each line of the files should
work I think:

import re
teststr = 'Tcha#195;#175;kovsky'
regex = re.compile(r'(?!(#\d{3};))(#\d{3};){2}(?!(#\d{3};))')
rObj = re.search(regex, teststr)
if rObj is not None:
  hexValues = [hex(int(rObj.group()[2:5])), hex(int(rObj.group()[8:11]))]
  newChar = ''.join([chr(int(c, 16)) for c in hexValues]).decode('utf8')
  print re.sub(regex, newChar, teststr)

outputTchaïkovsky

I've posted a more complete version here http://pastebin.com/vuq72irC

Cheers,

Mats