On 8/31/13 8:48 AM, Ben Companjen wrote:
> > Authors in OL may also be corporate entities (government bodies, > corporations, NGOs), conferences or group pseudonyms (although those > are more rare). There is no difference in the data yet, except for a > few records. As an FYI - in the incoming MARC records any x10's were mapped to "collaborator". At least, that's what I recall. The code starts around line 330 in: https://github.com/openlibrary/openlibrary/blob/master/openlibrary/catalog/marc/parse.py Unfortunately there were a lot of bad sources that put all of the authors in 100 regardless of type. Our thinking at the time was that only librarians really think of corporate bodies as authors; most people assume that "author" means "person". Putting the corporate authors in "collaborator" was kind of a cop out, but we weren't up to inventing any entirely new bibliographic category. This is why I am trying to preview any new sources before we load them. There are some that were loaded that were MARC in structure but not really in content. They are the source of many of the duplicates in the database (and because the data is bad they don't merge well), and I don't want us to make that situation worse. kc >> >> the RDF provided for authors could make better use of the information >> currently available. For a start, it should include a list of "is author of >> X" statements to link them to their works within OL. > Agreed. In the Work RDF there are links to Editions, so it's possible. > I already proposed some changes to the RDF output some time ago: > https://github.com/internetarchive/openlibrary/pull/136 but this was > not in it. > >> It should also include >> Wikipedia identifiers where these are present in the data > > By 'identifiers', did you mean URLs? There are Wikipedia URLs > ("Links") for some people. Some records include a special "wikipedia" > field. >> with a little gentle encouragement, we could make the author birth and death >> date information usable in a machine-processing sense. Most dates are >> already useful as entered, despite the lack of guidelines >> we could enable the (structured) recording of place of birth and death. >> There are a handful of these in the data already, crammed in on the end of >> the date field > > A bot should try to parse the dates and put these in the records in a > separate field (e.g. "date_of_birth_parsed"). The contents of this > field can then be transformed to an xsd:date value in RDF. >> >> Author names could be looked up on dbpedia, and if there is an existing >> entry (a) the link can be included in the OL data and (b) details like >> DoB/PoB can be copied from that source into the OL data. > > It is debatable whether that is allowed in accordance with the > CC-BY-SA licence that Wikipedia and DBpedia use, although we're not > too strict on the enforcement and don't use a less strict licence on > the OL data. Looking up a name in DBpedia could be challenging, but > experimenting is easy when you already have both datasets downloaded > in dump files. >> >> Richard > > Ben > _______________________________________________ > Ol-discuss mailing list - Ol-discuss@archive.org > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss > Archives: http://www.mail-archive.com/ol-discuss@archive.org/ > To unsubscribe from this mailing list, send email to > ol-discuss-unsubscr...@archive.org > -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-discuss mailing list - Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss Archives: http://www.mail-archive.com/ol-discuss@archive.org/ To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org