On 7/1/10 9:44 AM, "Jonathan Rochkind" <[email protected]> wrote:
the technical issues of maintaining the regular flow of updates from
dozens of content providers, and normalizing all data to go in the same
index, are non-trivial, I think now.
>>
This is very much one of the hardest parts, Jonathan.
Also, thinking about the kinds of services that users want from this data, we've
found the biggest need is to focus on citation references if you can get them.
(e.g. ISI)
And if you think the bibliographic metadata is poor quality, try
matching on brief reference metadata (that which doesn't contain unique
identifiers, of course.)
Complex fuzzy string matching and it still is never really great.
(this is part of the problem with cite counts being all over the map in the the
apps out there!)
My words to the wise are to NOT do local loading unless you have a lot of time
and money.
Vendors who are doing it have economies of scale. Individual institutions
typically
do not. If the community were to make agreements to have centralized management
at a few institutions for this kind of "open" dataset, maybe. But, as someone
noted, the middle-men
("value add" A&I producers - Thompson, EBSCO, etc.) are not going to love this
idea.
Miriam Blake
Los Alamos National Laboratory Research Library