EDIT ME!!!! http://ead.lib.virginia.edu/vivaxtf/view?docId=uva-sc/viu00888.xml;query=;brand=default#adminlink
On Fri, Jan 27, 2012 at 6:26 PM, Roy Tennant <roytenn...@gmail.com> wrote: > Oh, I should have also mentioned that some of the worst problems occur > when people treat their metadata like it will never leave their > institution. When that happens you get all kinds of crazy cruft in a > record. For example, just off the top of my head: > > * Embedded HTML markup (one of my favorites is an <img> tag) > * URLs to remote resources that are hard-coded to go through a > particular institution's proxy > * Notes that only have meaning for that institution > * Text that is meant to display to the end-user but may only do so in > certain systems; e.g., "Click here" in a particular subfield. > > Sigh... > Roy > > On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennant <roytenn...@gmail.com> wrote: > > Thanks a lot for the kind shout-out Leslie. I have been pondering what > > I might propose to discuss at this event, since there is certainly > > plenty of fodder. Recently we (OCLC Research) did an investigation of > > 856 fields in WorldCat (some 40 million of them) and that might prove > > interesting. By the time ALA rolls around there may something else > > entirely I could talk about. > > > > That's one of the wonderful things about having 250 million MARC > > records sitting out on a 32-node cluster. There are any number of > > potentially interesting investigations one could do. > > Roy > > > > On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslie <lesl...@loc.gov> > wrote: > >> Roy's fabulous "Bitter Harvest" paper: > http://roytennant.com/bitter_harvest.html > >> > >> -----Original Message----- > >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf > Of Walter Lewis > >> Sent: Wednesday, January 25, 2012 1:38 PM > >> To: CODE4LIB@LISTSERV.ND.EDU > >> Subject: Re: [CODE4LIB] Metadata war stories... > >> > >> On 2012-01-25, at 10:06 AM, Becky Yoose wrote: > >> > >>> - Dirty data issues when switching discovery layers or using > >>> legacy/vendor metadata (ex. HathiTrust) > >> > >> I have a sharp recollection of a slide in a presentation Roy Tennant > offered up at Access (at Halifax, maybe), where he offered up a range of > dates extracted from an array of OAI harvested records. The good, the bad, > the incomprehensible, the useless-without-context (01/02/03 anyone?) and on > and on. In my years of migrating data, I've seen most of those variants. > (except ones *intended* to be BCE). > >> > >> Then there are the fielded data sets without authority control. My > favourite example comes from staff who nominally worked for me, so I'm not > telling tales out of school. The classic Dynix product had a Newspaper > index module that we used before migrating it (PICK migrations; such a > joy). One title had twenty variations on "Georgetown Independent" (I wish > I was kidding) and the dates ranged from the early ninth century until > nearly the 3rd millenium. (apparently there hasn't been much change in > local council over the centuries). > >> > >> I've come to the point where I hand-walk the spatial metadata to links > with to geonames.org for the linked open data. Never had to do it for a > set with more than 40,000 entries though. The good news is that it isn't > hard to establish a valid additional entry when one is required. > >> > >> Walter >