from:"Ethan Gruber"

Re: [CODE4LIB] Describe sub-collections in DCAT - advice very much appreciated

2016-07-06 Thread Ethan Gruber

Sorry, to be a little more constructive:

If you can describe the difference between Europeana's functionality now
and your vision for your CKAN implementation, that would be helpful for
providing advice.

On Wed, Jul 6, 2016 at 10:36 AM, Ethan Gruber  wrote:

> Are these GLAMs also putting cultural heritage data into Europeana? You
> can already filter by country (that holds the work) in Europeana.There are
> 6 million objects from the Netherlands. Your energy might be better spent
> either harvesting Dutch material back out of Europeana into a separate
> Netherland-only interface or by focusing on integrating smaller
> institutions into Europeana via OAI-PMH.
>
> In fact, your own material are in Europeana:
> http://www.europeana.eu/portal/search?f%5BCOUNTRY%5D%5B%5D=netherlands&f%5BTYPE%5D%5B%5D=SOUND&q=
>
> Ethan
>
> On Tue, Jul 5, 2016 at 12:19 PM, Johan Oomen 
> wrote:
>
>> Good afternoon,
>>
>> In the Netherlands, we’re working on overhauling our current (OAI-PMH)
>> aggregation infrastructure towards a more distributed model. The aim is to
>> create a comprehensive collection of digitised cultural heritage objects
>> held by GLAMs across the country. A major component of the new
>> infrastructure is a register with collections. We are using CKAN as the
>> data management system for these collections.
>>
>> We are currently installing and configuring CKAN, and use DCAT for
>> describing datasets. We are interested in seeing other examples of
>> registries that describes digital heritage collections using the CKAN
>> software. One of the challenges we encounter is describing multi level
>> datasets like collection and sub-collections in the context of DCAT. An
>> example is a data provider in the Netherlands that provides an aggregated
>> oral history dataset for target audience ‘oral history’. We registered this
>> aggregated dataset, but we also want to register individual collections for
>> participating organisations. Therefore, the aggregated dataset is divided
>> into parts using xpath, xslt, etc.. Now we want to explicitly mark the
>> dataset parts as being a sub-dataset and vice versa.
>>
>> Question to this community, do you have implementations that use a CKAN
>> based registry for digital heritage collections, have you also dealt with
>> this issue to describe sub-collections in DCAT? How did you manage this?
>>
>> Your help is much appreciated,
>>
>> Best wishes,
>>
>> Johan Oomen
>> Netherlands Institute for Sound and Vision
>> @johanoomen
>
>
>

Re: [CODE4LIB] Describe sub-collections in DCAT - advice very much appreciated

2016-07-06 Thread Ethan Gruber

Are these GLAMs also putting cultural heritage data into Europeana? You can
already filter by country (that holds the work) in Europeana.There are 6
million objects from the Netherlands. Your energy might be better spent
either harvesting Dutch material back out of Europeana into a separate
Netherland-only interface or by focusing on integrating smaller
institutions into Europeana via OAI-PMH.

In fact, your own material are in Europeana:
http://www.europeana.eu/portal/search?f%5BCOUNTRY%5D%5B%5D=netherlands&f%5BTYPE%5D%5B%5D=SOUND&q=

Ethan

On Tue, Jul 5, 2016 at 12:19 PM, Johan Oomen 
wrote:

> Good afternoon,
>
> In the Netherlands, we’re working on overhauling our current (OAI-PMH)
> aggregation infrastructure towards a more distributed model. The aim is to
> create a comprehensive collection of digitised cultural heritage objects
> held by GLAMs across the country. A major component of the new
> infrastructure is a register with collections. We are using CKAN as the
> data management system for these collections.
>
> We are currently installing and configuring CKAN, and use DCAT for
> describing datasets. We are interested in seeing other examples of
> registries that describes digital heritage collections using the CKAN
> software. One of the challenges we encounter is describing multi level
> datasets like collection and sub-collections in the context of DCAT. An
> example is a data provider in the Netherlands that provides an aggregated
> oral history dataset for target audience ‘oral history’. We registered this
> aggregated dataset, but we also want to register individual collections for
> participating organisations. Therefore, the aggregated dataset is divided
> into parts using xpath, xslt, etc.. Now we want to explicitly mark the
> dataset parts as being a sub-dataset and vice versa.
>
> Question to this community, do you have implementations that use a CKAN
> based registry for digital heritage collections, have you also dealt with
> this issue to describe sub-collections in DCAT? How did you manage this?
>
> Your help is much appreciated,
>
> Best wishes,
>
> Johan Oomen
> Netherlands Institute for Sound and Vision
> @johanoomen

Re: [CODE4LIB] Anything Interesting Going on in Archival Metadata?

2016-05-24 Thread Ethan Gruber

There's a fair amount of innovation taking place with respect to linked
data in archives, but I don't think it's as well advertised as what's been
taking place in libraries in North America. The highest profile project in
the archival realm is Social Networks and Archival Context (
http://socialarchive.iath.virginia.edu/), which is focused mainly on
archival authorities, but there's a tremendous potential in being able to
aggregate archival content related to these authorities. Authorities and
archival content can and are being modelled into linked open data, but
there's no real standard for how to do this in the field. A group is
working on a conceptual reference model for archival collections, but the
modelling of people and their relationships is bold new territory. I've
done some work on this myself using a variety of existing ontologies and
software platforms to connect pieces from our archives, digital library,
authorities, and museum objects together into a cohesive framework (you can
read more at http://eaditor.blogspot.com/ and
http://numishare.blogspot.com/2016/03/updating-mantis-and-igch-incorporating.html
).

It is also possible to use CIDOC-CRM for the modelling of people and their
relationships and events (same for using the CRM to model archival
collections). CIDOC-CRM is rarely, if ever discussed in code4lib despite
its 'importance' in the cultural heritage sector (predominately in Europe).
I've had difficulty getting discussions for the modeling of authorities
into RDF off the ground with some grant applications that have fallen short.

Ethan

On Tue, May 24, 2016 at 9:57 AM, Matt Sherman 
wrote:

> Hi all,
>
> I was recently talking with some folks about some archives related
> things and realized that while I've heard a lot recently about
> different projects, advancements, and issues within library specific
> metadata, and its associated concerns, I have not heard as much
> recently about metadata in the archives realm.  Is there much going on
> there?  Is linked data even useful in a setting with extremely unique
> materials?  Is this a stupid question?  I don't know, but I am curious
> to hear if there are any interesting things people are doing in
> archival metadata or any challenges folks are working to overcome.
>
> Matt Sherman
>

Re: [CODE4LIB] question on harvesting RDF

2016-05-09 Thread Ethan Gruber

I don't recommend using different properties that have the same basic
semantic meaning for those different contexts (dc:subject vs.
dcterms:subject). In a linked data environment, I don't recommend using
Dublin Core Elements at all, but only dcterms. It is possible to harvest
subject terms regardless of whether it is a literal or a URI, but the
harvester might have to take some additional action to generate a human
readable result from an LCSH URI.

1. The harvester goes out and fetches the machine readable data for
http://id.loc.gov/authorities/subjects/sh85002782 to get the label
2. You import the RDF for LCSH into your system so that an OPTIONAL line
can be inserted into SPARQL (assuming you are using SPARQL) to get the
skos:prefLabel for the URI directly from your own system.

I'd suggest discussing these options with developers that may potentially
harvest your data, or at least provide a means to developers to give you
feedback so that you can deliver a web service that makes harvesting as
efficient as possible.

I hope this is useful. I think there are many possible solutions. But, in
sum, don't use dc:subject and dcterms:subject simultaneously.

Ethan

On Mon, May 9, 2016 at 1:58 PM, English, Eben  wrote:

> Hello all,
>
> A little context: the MODS and RDF Descriptive Metadata Subgroup
> (
> https://wiki.duraspace.org/display/hydra/MODS+and+RDF+Descriptive+Metadata+Subgroup
> )
> is a group of cultural institutions working together to model MODS XML
> as RDF.
>
> Our project diverges from previous efforts in this domain in that we're
> trying to come up with a model that takes more advantage of widely-used
> vocabularies and namespaces, avoiding blank nodes at all costs.
>
> As we work through the list of MODS elements, we've been stumbling on a
> few thorny issues, and with our goal of making our data as shareable as
> possible, we agreed that it would be helpful to try and get the input of
> folks who have more experience in harvesting and parsing RDF from the
> proliferation of data providers existing in the real world (see
> https://datahub.io/dataset for a great list).
>
> Specifically, when consuming RDF from a new data source, how big of a
> problem are the following issues:
>
>
> #1. Triples where the object may be a string literal or a URI
>
> For example, the predicate 'dc:subject' from the Dublin Core Elements
> vocabulary has no defined range, which means it can be used with both
> literal and non-literal values
> (
> http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dc:subject
> ).
>
> So one could have both in a data store:
>
> ex:myObject1  dc:subject  "aircraft" .
> ex:myObject2  dc:subject
>  .
>
>
> ... versus ...
>
>
> #2. Using multiple predicates with similar/overlapping definitions,
> depending on the value of the object
>
> For example, when expressing the subject of a work, using different
> predicates depending on whether there is an existing URI for a topic or
> not:
>
> ex:myObject1  dc:subject  "aircraft" .
> ex:myObject2  dcterms:subject
>  .
>
>
> We're wondering which approach is less problematic from a Linked
> Data-harvesting standpoint. Issue #1 requires that the parser be
> prepared to handle different types of values from the same predicate,
> but issue #2 involves parsing an additional namespace and predicate, etc.
>
> Any thoughts, suggestions, or comments would be greatly appreciated.
>
> Thanks,
> Eben
>
> --
> Eben English | Boston Public Library
> Web Services Developer
> 617-859-2238 |eengl...@bpl.org
>

Re: [CODE4LIB] Good Database Software for a Digital Project?

2016-04-15 Thread Ethan Gruber

There are countless ways to approach the problem, but I suggest beginning
with tools that are within the area of expertise of your staff. Mapping
disparate structured formats into a single Solr instance for fast search
and retrieval is one possibility.

On Fri, Apr 15, 2016 at 2:18 PM, Matt Sherman 
wrote:

> Hi all,
>
> I am looking to pick the group brain as to what might be the most useful
> database software for a digital project I am collaborating on.  We are
> working on converting an annotated bibliography to a searchable database.
> While I have the data in a few structured formats, we need to figure out
> now what to actually put it in so that it can be queried.  My default line
> of thinking is to try a MySQL since it is free and used ubiquitously
> online, but I wanted to see if there were any other database or software
> systems that we should also consider before investing a lot of time in one
> approach.  Any advice and suggestions would be appreciated.
>
> Matt Sherman
>

Re: [CODE4LIB] Structured Data Markup on library web sites

2016-03-23 Thread Ethan Gruber

We embed schema.org properties in RDFa within metadata for ETDs in our
Digital Library application, e.g.,
http://numismatics.org/digitallibrary/ark:/53695/money_and_power_in_the_viking_kingdom_of_york

I don't know exactly how Google's algorithms establish "authority," but the
ETDs in our system usually show up in the first few results in
Google--usually above academia.edu. Part of the reason is probably our use
of schema.org, but part of the reason is also because of the authority
Google's algorithms have put into content on numismatics.org.

We use RDFa throughout our digital applications, though not schema.org, but
with classes and properties more relevant to archives or coins. I think
that once the archival extension to schema.org is more formalized (Richard
Wallis is the driving force behind that discussion), we'll probably
implement that in our archives with EADitor (
https://github.com/ewg118/eaditor).

On Wed, Mar 23, 2016 at 9:05 AM, Jason Ronallo  wrote:

> Charlie,
>
> Since you've been here we've also added schema.org data for events:
> http://www.lib.ncsu.edu/event/red-white-black-walking-tour-4
>
> And for a long time we've used this for our special collections:
> http://d.lib.ncsu.edu/collections/catalog/mc00240-001-ff0093-001-001_0010
> And for videos on a few sites:
>
> http://d.lib.ncsu.edu/computer-simulation/videos/donald-e-knuth-interviewed-by-richard-e-nance-knuth
>
> Looking at it again now it could use some cleanup to trigger better
> rich snippets, but in the past it had been improving what our search
> results looked like.
>
> Jason
>
> On Wed, Mar 23, 2016 at 7:48 AM, Charlie Morris 
> wrote:
> > I can remember putting schema.org markup around the location information
> > for lib.ncsu.edu, and it's still there, checkout the footer. One small
> > example anyway. I'm not sure that it's actually had any effects though -
> I
> > don't see it in search engine results though and it's been there for
> > probably 2+ years now.
> >
> > On Tue, Mar 22, 2016 at 8:44 PM, Jennifer DeJonghe <
> > jennifer.dejon...@metrostate.edu> wrote:
> >
> >> Hello,
> >>
> >> I'm looking for examples of library web sites or university web sites
> that
> >> are using Structured Data / schema.org to mark up books, locations,
> >> events, etc, on their public web sites or blogs. I'm NOT really looking
> for
> >> huge linked data projects where large record sets are marked up, but
> more
> >> simple SEO practices for displaying rich snippets in search engine
> results.
> >>
> >> If you have examples of library or university websites doing this,
> please
> >> send me a link!
> >>
> >> Thank you,
> >> Jennifer
> >>
> >> Jennifer DeJonghe
> >> Librarian and Professor
> >> Library and Information Services
> >> Metropolitan State University
> >> St. Paul, MN
> >>
>

Re: [CODE4LIB] Listserv communication

2016-02-26 Thread Ethan Gruber

Nearly all of my professional communication occurs on Twitter, for better
or worse. I think that is probably the case for many of us. Code4lib is
very much alive, but perhaps has evolved into disparate conversations
taking place on Twitter instead of the listserv.

On Fri, Feb 26, 2016 at 10:07 AM, Shaun D. Ellis 
wrote:

>
> On Feb 26, 2016, at 8:42 AM, Julie Swierczek  > wrote:
>
> We also agreed that listservs – both here and elsewhere – seem to have
> shrinking participation over time, and there does seem to be a drive to
> pull more conversations out of the public eye.  There is no question that
> some matters are best discussed in private channels, such as feedback about
> individual candidates for duty officers, or matters pertaining to physical
> and mental well-being.  But when it comes to discussing technology or other
> professional matters, there seems to be a larger trend of more responses
> going off listservs.  (I, for one, generally do not reply to questions on
> listservs and instead reply to the OP privately because I’ve been burned to
> many times publicly.  The main listserv for archivists in the US has such a
> bad reputation for flaming that it has its own hashtag: #thatdarnlist.)
>
> Maybe we can brainstorm about common reasons for people not using the
> list: impostor syndrome (I don’t belong here and/or I certainly don’t have
> the right ‘authority’ to respond to this); fear of being judged - we see
> others being judged on a list (about the technological finesse of their
> response, for instance) so we don’t want to put ourselves in a position
> where we will be judged; fear of talking in general because we  have seen
> other people harmed for bringing their ideas to public forums (cf. doxing
> and swatting);  fear of looking stupid in general.
>
> Thank you for bringing this up, Julie.  I have been curious about this
> myself. I think you are correct in that there is some “impostor syndrome
> involved, but my hypothesis is that there has been a lot of splintering of
> the channels/lists over the past several years that has dried up some of
> the conversation.  For one, there’s StackOverflow.  StackOverflow is more
> effective than a listserv on general tech questions because it requires you
> to ask questions in a way that is clear (with simple examples) and keeps
> answers on topic.  There has also been a move towards specific project
> lists so that more general lists like Code4Lib are not bombarded with
> discussions about project-related minutia that are only relevant to a
> certain sub-community.
>
> I don’t see this as a bad thing, as it allows Code4Lib to be a gathering
> hub among many different sub-groups.  But it can make it difficult to know
> what is appropriate to post and ask here. Code4Lib has always been about
> inspiration and curiosity to me. This is a place to be a free thinker, to
> question, to dissent, to wonder.  We have a long tradition of “asking
> anything” and we shouldn’t discourage that, but I think Code4Lib is a
> particularly good space to discuss bigger-picture tech-in-library
> issues/challenges as well as general best practices at a “techy” level.
> It’s certainly the appropriate space to inspire others with amazing
> examples of library tech that delights users. :)
>
> I have to admit that I was disappointed that the recent question about
> full-text searching basics (behind OregonDigital’s in-page highlighting of
> keywords in the IA Bookreader) went basically unanswered.  This was a
> well-articulated legitimate question, and at least a few people on this
> list should be able to answer it. It’s actually on my list to try to do it
> so that I can report back, but maybe someone could save me the trouble and
> quench our curiosity?
>
> Cheers,
> Shaun
>
>
>
>
>
>

Re: [CODE4LIB] TEI->EPUB serialization testing

2016-01-14 Thread Ethan Gruber

Thanks, Eric. Is the original code online anywhere? I will eventually write
some XSL:FO to generate PDFs for people who want those, for some reason.

On Thu, Jan 14, 2016 at 10:05 AM, Eric Lease Morgan  wrote:

> On Jan 13, 2016, at 4:17 PM, Ethan Gruber  wrote:
>
> > Part of this grant stipulates that open access books be made available
> in EPUB 3.0.1, so I got to work on a pipeline for dynamically serializing
> TEI into EPUB. It works pretty well, but there are some minor issues. The
> issues might be related more to differences between individual ereader apps
> in supporting the 3.0.1 spec than anything I might have done wrong in the
> serialization process (the file validates according to a script I've been
> running)…
> >
> > If you are interested in more information about the framework, there's
> http://eaditor.blogspot.com/2015/12/the-ans-digital-library-look-under-hood.html
> and
> http://eaditor.blogspot.com/2016/01/first-ebook-published-to-ans-digital.html.
> It's highly LOD aware and is capable of posting to a SPARQL endpoint so
> that information can be accessed from other archival frameworks and
> integrated into projects like Pelagios.
>
>
> I wrote a similar thing a number of years ago, and it was implemented as
> Alex Lite. [1] I started out with TEI files, and then transformed them into
> a number of derivatives: simple HTML, “cooler” HTML, PDF, and ePub. I think
> my ePub version was somewhere around 2.0. The “framework” was written in
> Perl, of course.  ;-)  The whole of a Alex Lite was designed to be given
> away on CD or as an instant website. (“Just add water."). The hard part of
> the whole thing was the creation of the TEI files in the first place. After
> that, everything was relatively easy.
>
> [1] Alex Lite blog posting - http://bit.ly/eazpJY
> [2] Alex Lite - http://infomotions.com/sandbox/alex-lite/
>
> —
> Eric Lease Morgan
> Artist- And Librarian-At-Large
>
> (A man in a trench coat approaches, and says, “Psst. Hey buddy, wanna buy
> a registration to the Code4Lib conference!?”)
>

[CODE4LIB] TEI->EPUB serialization testing

2016-01-13 Thread Ethan Gruber

Hi all,

I've been working on and off for a few months on a system for publishing
ebooks, ETDs, and other digital library materials online to a more
consolidated "Digital Library" application (
http://numismatics.org/digitallibrary). The framework (
https://github.com/AmericanNumismaticSociety/etdpub) was initially designed
for quick and easy PDF indexing and publication of ETDs, but has evolved to
a TEI publication framework for the NEH-Mellon Humanities Open Book Program
grant we received recently.

Part of this grant stipulates that open access books be made available in
EPUB 3.0.1, so I got to work on a pipeline for dynamically serializing TEI
into EPUB. It works pretty well, but there are some minor issues. The
issues might be related more to differences between individual ereader apps
in supporting the 3.0.1 spec than anything I might have done wrong in the
serialization process (the file validates according to a script I've been
running).

We published our first open access ebook today:
http://numismatics.org/digitallibrary/id/Miller-ANS-Medals. There's a link
on the right to the EPUB file. I would greatly appreciate any feedback you
can provide. I created a survey that will help in usability testing:
https://docs.google.com/forms/d/10Prvpm5eDvjNZaeqgXZ7luLeSkVrOgZ3hJX5zjFBuSg/viewform
.

There is a dearth of decent information about EPUB usability testing on the
web.

If you are interested in more information about the framework, there's
http://eaditor.blogspot.com/2015/12/the-ans-digital-library-look-under-hood.html
and
http://eaditor.blogspot.com/2016/01/first-ebook-published-to-ans-digital.html.
It's highly LOD aware and is capable of posting to a SPARQL endpoint so
that information can be accessed from other archival frameworks and
integrated into projects like Pelagios.

Ethan

[CODE4LIB] Fwd: [LODLAM] seeking LODLAM Workshop Leaders

2015-08-31 Thread Ethan Gruber

-- Forwarded message --
From: 
Date: Mon, Aug 31, 2015 at 4:20 PM
Subject: [LODLAM] seeking LODLAM Workshop Leaders
To: lod-...@googlegroups.com


Hey folks,
We've got some limited funding available to help support a number of
workshops over the next 18 months and we're looking for volunteers willing
to lead 1.5 hour hands-on workshops. Please take a minute to fill out the
below form if you're interested.

Thanks!
LODLAM workshop coordinators: Jon, Ethan, Anne

If you have trouble viewing or submitting this form, you can fill it out in
Google Forms
<https://docs.google.com/forms/d/1Az8ylu76m-bcSDhgYVf-31s2-OLOkHmvSx5k_I9a-no/viewform?c=0&w=1&usp=mail_form_link>.


LODLAM Workshop Leaders Interest Form
We’re seeking volunteers to run workshops at a number of key conferences
throughout 2015-16 designed to teach the basics of Linked Open Data in
Libraries, Archives and Museums LODLAM. In doing so, we hope to strengthen
a growing community of practitioners willing and able to build upon shared
cultural heritage and scientific data. The expectation is that we can
assemble and share the workshop plans and content to enable more and more
people to host and teach them over time.

In particular, we’re looking for people to teach tools in 1.5 hr sessions
with real data that address the following 4 categories:
--Big picture view: Introduce the basic concepts of LODLAM integrating
examples of what people are already doing with it.
--Cleaning: Use Open Refine to clean and reconcile datasets to make them
more usable for the public.
--Publishing: Demonstrate ways that people can publish datasets in the
library/archive/museum space - from publishing CSV’s and posting datasets
in Github to rdf’izing in Open Refine or using triplestores.
--Reusing and Building: Teach SPARQL as well as open source tools used to
visualize single or multiple collections.

Let us know if you can help!

Please contact Jon Voss (jon.v...@shiftdesign.org.uk), Ethan Gruber (
ewg4x...@gmail.com), or Anne Gaynor (amgayn...@gmail.com) if you have any
questions.

* Required

   Personal Information
   First Name *
   Last Name *
   Email address *
   Country *
   Affiliation
   Twitter handle
   Phone number
   What would you like to teach?
   Which sections are you interesting in teaching? *
   - Big picture view: Introduce the basic concepts of LODLAM integrating
  examples of what people are already doing with it.
  - Cleaning: Use Open Refine to clean and reconcile datasets to make
  them more usable for the public.
  - Publishing: Demonstrate ways that people can publish datasets in
  the library/archive/museum space - from publishing
  - Reusing and Building: Teach SPARQL as well as open source tools
  used to visualize single or multiple collections.
   Tell us more about what you'd like to teach *
   What specific concepts, tools, languages, etc. can you teach? Is the
   tool free? Is it open source?
   Where could you teach?
   We're colocating these sessions with a number of conferences throughout
   2015-16. At which conference(s) would you be able to teach a session? NOTE:
   There will be very limited travel stipends, conference discounts, or
   honorariums available, so please keep that in mind as you select. You may
   want to select conferences nearby or one that you are already planning to
   attend.
   Select one or more places you would be able to teach: *
   - Digital Library Federation, Vancouver, BC: October 26-28, 2015
  - Archaeological Institute of America/Society of Classical Studies,
  San Francisco, CA: January 6-9, 2016
  - code4lib, Philadelphia, PA: March 7-10, 2016
  - Electronic Resources & Libraries, Austin, TX: April 3-6, 2016
  - Museums and the Web 2016, Los Angeles, CA: April 6-9, 2016
  - DPLA Fest, Location TBD: mid-April 2016
  - Society of American Archivists, Atlanta, GA: July 31-August 6 2016
  - Smart Data, San Jose, CA: August 2016
  - Dublin Core Metadata Initiative / Society for Information Science
  and Technology, Copenhagen, Denmark: October 13-16, 2016
   Other participation
   Would you be willing to join a list a speakers available to institutions
   looking to bring specialists to run workshops? *
   - Yes!
  - No thanks
  - Let me think about it...
   Anything else?
   Any other thoughts? Ideas? Questions?
   Never submit passwords through Google Forms.

Powered by
[image: Google Forms]
<https://www.google.com/forms/about/?utm_source=product&utm_medium=forms_logo&utm_campaign=forms>
This form was created outside of your domain.
Report Abuse
<https://docs.google.com/forms/d/1Az8ylu76m-bcSDhgYVf-31s2-OLOkHmvSx5k_I9a-no/reportabuse?source=https://docs.google.com/forms/d/1Az8ylu76m-bcSDhgYVf-31s2-OLOkHmvSx5k_I9a-no/viewform?sid%3D4821e0f552fca85f%26c%3D0%26w%3D1%26token%3DRVOShU8BAAA.WEqvdH_C1lQk0o5pfmjV9Q.Y-yfjSmMDzwLcHQI1WHWKg>
- Terms of Servic

Re: [CODE4LIB] eebo

2015-06-05 Thread Ethan Gruber

Are these in TEI? Back when I worked for the University of Virginia
Library, I did a lot of clean up work and migration of Chadwyck-Healey
stuff into TEI-P4 compliant XML (thousands of files), but unfortunately all
of the Perl scripts to migrate old garbage SGML into XML are probably gone.

How many of these things are really worth keeping, i.e., were not digitized
by any other organization that has freely published them online?

On Fri, Jun 5, 2015 at 8:10 AM, Eric Lease Morgan  wrote:

> Does anybody here have experience reading the SGML/XML files representing
> the content of EEBO?
>
> I’ve gotten my hands on approximately 24 GB of SGML/XML files representing
> the content of EEBO (Early English Books Online). This data does not
> include page images. Instead it includes metadata of various ilks as well
> as the transcribed full text. I desire to reverse engineer the SGML/XML in
> order to: 1) provide an alternative search/browse interface to the
> collection, and 2) support various types of text mining services.
>
> While I am making progress against the data, it would be nice to learn of
> other people’s experience so I do not not re-invent the wheel (too many
> times). ‘Got ideas?
>
> —
> Eric Lease Morgan
> University Of Notre Dame
>

Re: [CODE4LIB] XSLT Advice

2015-06-02 Thread Ethan Gruber

You really just need to wrap the label in the xsl:text and the xsl:value of
in an xsl:if that tests whether the value-of XPath returns a string.




 Vol. 



 Issue 



If there's no name at all, you'd want to wrap an xsl:if around the
dc:identifier so that you suppress an empty dc:identifier element.

On Tue, Jun 2, 2015 at 3:34 PM, Matt Sherman 
wrote:

> Cool.  I talked to Ron via phone so I am getting a better picture, but
> I am still happy to take more insights.
>
> So the larger context.  I inherited a DSpace instance with three
> custom metadata fields which actually have some useful publication
> information, though they improperly titled them in by associating them
> with a dc prefix but there were two many to fix quickly and they
> haven't broken DSpace yet so we continue.  So I added to the XSL to
> pull the data within the the custom fields to display "publication
> name" Vol. "publication volume" Issue "publication issue".  That
> worked really well until I realized that there was no conditional so
> even when the fields are empty I still get: Vol.
> Issue
>
> So here are the Custom Metadata fields:
>
> dc.publication.issue
> dc.publication.name
> dc.publication.volume
>
>
> Here is the customized XSLT, with dc.identifier added for context of
> what the rest of the sheet looks like.
>
> 
> 
> select="doc:metadata/doc:element[@name='dc']/doc:element[@name='identifier']/doc:element/doc:field[@name='value']">
> 
> 
>
> 
>  select="doc:metadata/doc:element[@name='dc']/doc:element[@name='identifier']/doc:element/doc:element/doc:field[@name='value']">
> 
> 
>
> 
> 
> select="doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='name']/doc:element/doc:field[@name='value']"/>
> Vol. 
> select="doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='volume']/doc:element/doc:field[@name='value']"/>
> Issue 
> select="doc:metadata/doc:element[@name='dc']/doc:element[@name='publication']/doc:element[@name='issue']/doc:element/doc:field[@name='value']"/>
>
>
> Ron suggested that using choose and when and that does seem to make
> the most sense.  The other trickiness is that I have found that some
> of these fields as filled when others are blank, such as their being a
> volume but not an issue.  So I need to figure out how to test multiple
> fields so that I can have it display differently dependent on what has
> data or not at all none of the fields are filled, which is the case in
> items such as posters.
>
> So any thoughts would help.  Thanks.
>
> On Tue, Jun 2, 2015 at 2:50 PM, Wick, Ryan 
> wrote:
> > I agree with Stuart, post the example here.
> >
> > Or if you want more real-time chat there's always #code4lib IRC.
> >
> > For an XSLT resource, Dave Pawson's site is great:
> http://www.dpawson.co.uk/xsl/sect2/sect21.html
> >
> > Ryan Wick
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Stuart A. Yeates
> > Sent: Tuesday, June 02, 2015 11:46 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] XSLT Advice
> >
> > There are a number of experienced xslt'ers here. Post your example to
> the group so we can all learn.
> >
> > Cheers
> > Stuart
> >
> > On Wednesday, June 3, 2015, Matt Sherman 
> wrote:
> >
> >> Hi all,
> >>
> >> I am making a few corrections on an oai_dc.xslt file for our DSpace
> >> instance I slightly botched modifying to integrate some custom
> >> metadata into a dc.identifier citation in the OAI-PMH harvest.  I need
> >> to get proper conditionals so it can display and harvest the metadata
> >> correctly and not run when there is no data in those fields.  I have a
> >> pretty good idea what I need to do, and if this were like JavaScript
> >> or Python I could probably muddle through.  The trouble is that I
> >> don't know the conditional syntax for XSLT quite well enough to know
> >> what I can do and thus need to do.  Plus the online resources for
> >> learning/referencing XSLT for this are a bit shallow for what I need
> >> hence asking the group.  So if there is anyone who knows XSLT really
> >> well that would be willing to talk with me for a bit to help me work
> >> through what I need to get the syntax to work like I want I would
> >> appreciate it.  Thanks.
> >>
> >> Matt Sherman
> >>
> >
> >
> > --
> > --
> > ...let us be heard from red core to black sky
>

Re: [CODE4LIB] Library Hours

2015-05-06 Thread Ethan Gruber

+1 on the RDFa and schema.org. For those that don't know the library URL
off-hand, it is much easier to find a library website by Googling than it
is to go through the central university portal, and the hours will show up
at the top of the page after having been harvested by search engines.

On Tue, May 5, 2015 at 6:54 PM, Karen Coyle  wrote:

> Note that library hours is one of the possible bits of information that
> could be encoded as RDFa in the library web site, thus making it possible
> to derive library hours directly from the listing of hours on the web site
> rather than keeping a separate list. Schema.org does have the elements such
> that hours can be encoded. This would mean that hours could show in the
> display of the library's catalog entry on Google, Yahoo and Bing. Being
> available directly through the search engines might be sufficient, not
> necessitating creating yet-another-database for that data.
>
> Schema.org uses a restaurant as its opening hours example, but much of the
> data would be the same for a library:
>
> http://schema.org/"; typeof="Restaurant">
>   GreatFood
>   
> 4 stars -
> based on 250 reviews
>   
>   
> 1901 Lemur Ave
> Sunnyvale,
> CA  property="postalCode">94086
>   
>   (408) 714-1489
>   http://www.dishdash.com";>www.greatfood.com
>   Hours:
>   Mon-Sat 11am -
> 2:30pm
>   Mon-Thu 5pm -
> 9:30pm
>   Fri-Sat 5pm -
> 10:00pm
>   Categories:
>   
> Middle Eastern
>   ,
>   
> Mediterranean
>   
>   Price Range: $$
>   Takes Reservations: Yes
> 
>
> It seems to me that using schema.org would get more bang for the buck --
> it would get into the search engines and could also be aggregated into
> whatever database is needed. As we've seen with OCLC, having a separate
> listing is likely to mean that the data will be out of date.
>
> kc
>
> On 5/5/15 2:19 PM, nitin arora wrote:
>
>> I can't see they distinguished between public libraries and other types on
>> their campaign page.
>>
>> They say " all libraries" as far as I can see.
>> So I suppose then that this is true for "all libraries":
>> "Libraries offer a space anyone can enter, where money isn't exchanged,
>> and
>> documentation doesn't have to be shown."
>> Who knew fines and library/student-IDs were a thing of the past?
>>
>> The only data sets I can find where they got the 17,000 number is for
>> public libraries:
>> http://www.imls.gov/research/pls_data_files.aspx
>> Maybe I missed something.
>> There is an hours field on one of the CSVs I downloaded, etc for 2012 data
>> (the most recent I could find).
>>
>> Asking 10k for something targeted for completion in June and without a
>> grasp on what types of libraries there are and how volatile the hours
>> information is (especially in crisis) ...
>> Sounds naive at best, sketchy at worst.
>>
>> The "flexible funding" button says "this campaign will receive all funds
>> raised even if it does not reach its goals".
>>
>> "The value of these places for youth cannot be underestimated."
>> So is the value of a quick buck ...
>>
>> On Tue, May 5, 2015 at 4:53 PM, McCanna, Terran <
>> tmcca...@georgialibraries.org> wrote:
>>
>>  I'm not at all surprised that this doesn't already exist, and even if
>>> OCLC's was available, I'd be willing to bet it was out of date.
>>>
>>> Public library hours, especially in underfunded areas, may fluctuate
>>> depending on funding cycles, seasons (whether school is in or out), etc.,
>>> not to mention closing/reopening/moving because of old buildings that
>>> need
>>> to be updated. We have around 280 locations in our consortium and we have
>>> to rely on self-reporting to find out if their hours change. We certainly
>>> don't have staff time to check every one of their web sites on regular
>>> basis, I can't imagine keeping track of 17,000!
>>>
>>>
>>> Terran McCanna
>>> PINES Program Manager
>>> Georgia Public Library Service
>>> 1800 Century Place, Suite 150
>>> Atlanta, GA 30345
>>> 404-235-7138
>>> tmcca...@georgialibraries.org
>>>
>>>
>>> - Original Message -
>>> From: "Peter Murray" 
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Sent: Tuesday, May 5, 2015 4:36:56 PM
>>> Subject: Re: [CODE4LIB] Library Hours
>>>
>>> OCLC has an institutional registry [1], which had (in part) library
>>> hours,
>>> addresses, and so forth.  It seems to be unavailable, though [2].  That
>>> is
>>> the only systematic collection of library hours data that I know about.
>>>
>>>
>>> Peter
>>>
>>> [1] https://www.oclc.org/worldcat-registry.en.html
>>> [2] https://www.worldcat.org/registry/institution/
>>>
>>>  On May 5, 2015, at 4:16 PM, Bigwood, David 

>>> wrote:
>>>
 This looks like a decent group, but I find this statement hard to

>>> believe.
>>>
 "Your tax-deductible donation supports adding the names, address and the

>>> hours of operation of all libraries to Range. The Institute of Museum and
>>> Library Services publishes an open data catalog which is the source we'll
>>> use for the names

Re: [CODE4LIB] seeking linked data-based user interface examples in libraries

2015-02-11 Thread Ethan Gruber

It depends on what you mean by interface. Are you just looking for social
network visualizations or virtually any interface built on LOD (which may
be quite varied and transparent to the point you don't even realize you are
interacting with linked data)?

Most of these social network graphs are generated from static files (like
the SNAC radial graph, which is a graph XML scheme derived from EAC-CPF) or
from desktop tools. The holy grail for social network analysis is to build
these visualizations in HTML5/Javascript on top of dynamic web services
(e.g., from SPARQL). I'm going to start working on this as soon as this
summer in xEAC (https://github.com/ewg118/xEAC) as soon as I finish the
EAC-CPF -> CIDOC-CRM crosswalk.

On Wed, Feb 11, 2015 at 10:12 AM, David Lowe <
david.b.lowe.librar...@gmail.com> wrote:

> I consider SNAC and its radial graph view one of the leaders in this space:
> http://socialarchive.iath.virginia.edu/xtf/search
> --DBL
>
> On 2/11/15, Sheila M. Morrissey  wrote:
> > Do you know if the relationship-viewer source code open source and
> > available?
> > Thanks,
> > sheila
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> > Kevin Hawkins
> > Sent: Tuesday, February 10, 2015 11:27 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] seeking linked data-based user interface
> examples in
> > libraries
> >
> > Here's one that I heard about at a presentation at ALA Midwinter:
> >
> > http://civilwaronthewesternborder.org/content/relationship-viewer
> >
> > People also like to cite this one, though it's not, strictly speaking,
> based
> > in a library:
> >
> > https://linkedjazz.org/
> >
> > --Kevin
> >
> > On 2/10/15 12:39 PM, Adam L. Chandler wrote:
> >> Hi,
> >>
> >>
> >> I am working on a presentation about linked data and I need some help.
> My
> >> talk is about examples of linked data-based user interfaces in
> libraries,
> >> wireframes, demos, or working systems. I am having difficulty finding
> >> them. Please send me your examples.
> >>
> >>
> >> Thanks,
> >>
> >> Adam Chandler
> >>
> >
>

Re: [CODE4LIB] Restrict solr index results based on client IP

2015-01-07 Thread Ethan Gruber

There are a few ways to do this, and yes, some version of #2 is desirable.
I think it may depend on how specific these IP addresses are. Do you
anticipate that one IP range may have access to X documents and a different
IP range may have access to Y documents, or will all IP ranges have access
to the same restricted documents (i.e., anyone on campus can access
everything). The former scenario requires IPs to stored in the Solr docs
and the second only requires a boolean field type, e.g. restricted =
yes/no. In fact, in the former scenario, you'd probably want to associate
the IP range with of key of some sort, e.g.

In the schema, have field name="group"

In your doc have the group field contain the value "medical_school". Then
somewhere in your application (not stored and indexed in Solr), you can say
that "medical_school" carries the ranges 192.168,1.*, 192.168.2.*, etc.
That way, if the medical school picks up a new IP range or the range
changes, you can make a minor update to your application without having to
reindex content in Solr.

Ethan

On Wed, Jan 7, 2015 at 11:41 AM, Chad Mills  wrote:

> Hello,
>
> Basically I have a solr index where, at times, some of the results from a
> query will only be limited to a set of users based on their clients IP
> address.  I have been thinking about accomplishing this in either two ways.
>
> 1) Post-processing the results for IP validity against an external data
> source and dropping out those results which are not valid.  That could
> leave me with a portioned result list that would need another query to fill
> back in.  Say I want 10 results, I end up dropping 2 of them, I need to
> fill back in those 2 by performing another query.
>
> 2) Making the IP permission check part of the query.  Basically appending
> an AND in the query on a field that stores the permissible IP addresses.
> The index field would be set to allow all IPs to access the result by
> default, but at times can contain the allowable IP addresses or maybe even
> ranges somehow.
>
> Are there some other ways to accomplish this I haven't considered?  Right
> now #2 sounds seems more desirable to me.
>
> Thanks in advance for your thoughts!
>
> --
> Chad Mills
> Digital Library Architect
> Ph: 848.932.5924
> Fax: 848.932.1386
> Cell: 732.309.8538
>
> Rutgers University Libraries
> Scholarly Communication Center
> Room 409D, Alexander Library
> 169 College Avenue, New Brunswick, NJ 08901
>
> https://rucore.libraries.rutgers.edu/
>

Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Ethan Gruber

I recently extended Fuseki to hook into a Solr index for geographic query
for one of our linked data projects, and I'm happy with the results so far.
It will open the door for us to build more sophisticated geographic
visualizations. I have not extended Fuseki for Lucene/Solr based full text
search, as we have a standalone Solr index for that, and a separate search
interface (for general users) from the SPARQL query interface (for advanced
ones).

It's definitely true that there are scaling limitations in SPARQL--just
look at how often dbpedia and the British Museum SPARQL endpoint go down.
Hardware is overcoming these limitations, but I still advocate a hybrid
approach: using Solr where it is advantageous to do so, and then build
focused user interfaces on top of SPARQL, leveraging the advantages of a
triplestore in contexts other than search. We open up our SPARQL endpoint
to the public, but by far more users interact with SPARQL through a HTML
interfaces in several different projects without having any idea that they
are doing so. We only have about a million triples in our triplestore (but
this is going to grow enormously in less than two years, I think, as the
floodgates are about to open in the world of ancient Greco-Roman coins),
but the system has only gone down for about 2 minutes in the last 2.5
years, on a virtual machine with only 4GB of memory.

Ethan

On Fri, Dec 19, 2014 at 10:20 AM, Mixter,Jeff  wrote:
>
> A triplestore is basically a database backend for RDF triples. The major
> benefit is that it allows for SPARQL querying. You could imagine a
> triplestore as being the same thing as a relational database that can be
> queried with SQL.
>
> The drawback that I have run into is that unless you have unlimited
> hardware, triplestores can run into scaling problems (when you are looking
> at hundreds of millions or billions of triples). This is a problem when you
> want to search for data. For searching I use a hybrid Elasticsearch (i.e.
> Lucene) index for the string literals and the go out to the triplestore to
> query for the data.
>
> If you are looking to use a triplestore it is important to distinguish
> between search and query.
>
> Triplestore are really good for query but not so good for search. The
> basic problem with search is that is it mostly string based and this
> requires a regular expression query in SPARQL which is expensive from a
> hardware perspective.
>
> There are a few triple stores that use a hybrid model. In particular Jena
> Fuseki (http://jena.apache.org/documentation/query/text-query.html)
>
> Thanks,
>
> Jeff Mixter
> Research Support Specialist
> OCLC Research
> 614-761-5159
> mixt...@oclc.org
>
> 
> From: Code for Libraries  on behalf of Forrest,
> Stuart 
> Sent: Friday, December 19, 2014 10:00 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] rdf triplestores
>
> Hi All
>
> My question is what do you guys use triplestores for?
>
> Thanks
> Stuart
>
>
>
> 
> Stuart Forrest PhD
> Library Systems Specialist
> Beaufort County Library
> 843 255 6450
> sforr...@bcgov.net
>
> http://www.beaufortcountylibrary.org
>
> For Leisure, For Learning, For Life
>
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Stefano Bargioni
> Sent: Monday, November 11, 2013 8:53 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] rdf triplestores
>
> My +1 for Joseki.
> sb
>
> On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:
>
> > What is your favorite RDF triplestore?
> >
> > I am able to convert numerous library-related metadata formats into
> RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
> simply putting the resulting files on an HTTP file system. But if I were to
> import my RDF/XML into a triplestore, then I could do a lot more. Jena
> seems like a good option. So does Openlink Virtuoso.
> >
> > What experience do y'all have with these tools, and do you know how to
> import RDF/XML into them?
> >
> > --
> > Eric Lease Morgan
> >
>

Re: [CODE4LIB] Functional Archival Resource Keys

2014-12-09 Thread Ethan Gruber

I'm using a few applications in Tomcat, so inflections are much more
difficult to implement than content negotiation. I can probably tweak the
Apache settings to do a proxypass for inflections by modifying the examples
above.

I agree with Conal, though. Inflections are puzzling at best and bad
architecture at worst, and the sooner the community puts forward a more
standard solution, the better.

On Mon, Dec 8, 2014 at 7:21 PM, John Kunze  wrote:

> Just as a URL permits an ordinary user with a web browser to get to an
> object, inflections permit an ordinary user to see metadata (without curl
> or code).
>
> There's nothing to prevent a server from supporting both the HTTP Accept
> header (content negotiation) and inflections.  If you can do the one, the
> other should be pretty easy.
>
> On Mon, Dec 8, 2014 at 4:01 PM, Conal Tuohy  wrote:
>
> > I am really puzzled by the use of these non-standard "inflexions" as a
> > means of qualifying an HTTP request. Why not use the HTTP Accept header,
> > like everyone else?
> >
> >
> > On 9 December 2014 at 07:59, John A. Kunze  wrote:
> >
> > > Any Apache server (not Tomcat) can handle the '?' and '??' cases with a
> > > few rewrite rules to transform them into typical CGI-like query
> strings.
> > >
> > >   # Detect ? and ?? inflections and map to typical CGI-style
> parameters.
> > >   # One question mark case:  ?  -> ?show=brief&as=anvl/erc
> > >   RewriteCond %{THE_REQUEST}  \?
> > >   RewriteCond %{QUERY_STRING} ^$
> > >   RewriteRule ^(.*)$ "$1?show=brief&as=anvl/erc"
> > >
> > >   # Two question mark case:  ?? -> ?show=support&as=anvl/erc
> > >   RewriteCond %{QUERY_STRING} ^\?$
> > >   RewriteRule ^(.*)$ "$1?show=support&as=anvl/erc"
> > >
> > > So if your architecture supports query strings of the form
> > >
> > >   ?name1=value1&name2=value2&...
> > >
> > > it can support ARK inflections.
> > >
> > >  I don't believe that the ARK spec and HTTP URIs are fully compatible
> > >> ideas.
> > >>
> > >
> > > True.  A '?' by itself has no meaning in the URI spec, which means it's
> > > also an opportunity to do something intuitive and important with an
> > > unused portion of the "instruction space" (of strings that start out
> > > looking like URLs).  Any URLs (not just ARKs) could support this.
> > >
> > > The THUMP spec (where inflections really live) will be modified to
> > > require an extra HTTP response header to indicate that the server is
> > > responding to an inflection and not to a standard URI query string.
> > > This could help in the '??' case, which actually could be interpreted
> > > as a valid URI query string.
> > >
> > > -John
> > >
> > >
> > >
> > > --- On Mon, 8 Dec 2014, Ethan Gruber wrote:
> > >
> > >> Thanks for the info. I'm glad I'm not the only person struggling with
> > >> this.
> > >> I'm not entirely sure my architecture will allow me to append question
> > >> marks in this way (two question marks is probably feasible, but it
> > doesn't
> > >> appear that one is). I don't believe that the ARK spec and HTTP URIs
> are
> > >> fully compatible ideas. Hopefully some clearer request parameter or
> > >> content
> > >> negotiation standards emerge.
> > >>
> > >> Ethan
> > >>
> > >> On Sat, Dec 6, 2014 at 10:23 AM, Phillips, Mark <
> mark.phill...@unt.edu>
> > >> wrote:
> > >>
> > >>  Ethan,
> > >>>
> > >>> As Mark mentioned we have implemented the ARK inflections of ? and ??
> > >>> with
> > >>> our systems.
> > >>>
> > >>> I remember the single ? being a bit of a problem to implement in our
> > >>> system stack (Apache/mod_python/Django) and from what I can tell
> isn't
> > >>> possible with (Apache/mod_wsgi/Django) at all.
> > >>>
> > >>> The ?? inflection wasn't really a problem for us on either of the
> > >>> systems.
> > >>>
> > >>> From conversations I've had with implementors of ARK,  the issues
> > around
> > >>> supporting the ? and ?? inflections don't seem to be related to the
> > >>> framewor

Re: [CODE4LIB] Functional Archival Resource Keys

2014-12-08 Thread Ethan Gruber

Thanks for the info. I'm glad I'm not the only person struggling with this.
I'm not entirely sure my architecture will allow me to append question
marks in this way (two question marks is probably feasible, but it doesn't
appear that one is). I don't believe that the ARK spec and HTTP URIs are
fully compatible ideas. Hopefully some clearer request parameter or content
negotiation standards emerge.

Ethan

On Sat, Dec 6, 2014 at 10:23 AM, Phillips, Mark 
wrote:

> Ethan,
>
> As Mark mentioned we have implemented the ARK inflections of ? and ?? with
> our systems.
>
> I remember the single ? being a bit of a problem to implement in our
> system stack (Apache/mod_python/Django) and from what I can tell isn't
> possible with (Apache/mod_wsgi/Django) at all.
>
> The ?? inflection wasn't really a problem for us on either of the systems.
>
> From conversations I've had with implementors of ARK,  the issues around
> supporting the ? and ?? inflections don't seem to be related to the
> frameworks issues as other issues like commitment to identifiers, the fact
> that ARKs are being used in a redirection based system like Handles, or the
> challenges of accessing the item metadata for items elsewhere in their
> system.
>
> I think having a standard set of request parameters or other url
> conventions could be beneficial to the implementation of these features by
> others.
>
> Mark
> 
> From: Code for Libraries  on behalf of
> todd.d.robb...@gmail.com 
> Sent: Saturday, December 6, 2014 8:23 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Functional Archival Resource Keys
>
> This brief exchange on Twitter seems relevant:
>
> https://twitter.com/abrennr/status/296948733147508737
>
> On Fri, Dec 5, 2014 at 12:50 PM, Mark A. Matienzo  >
> wrote:
>
> > Hi Ethan,
> >
> > I'm hoping Mark Phillips or one of his colleagues from UNT will respond,
> > but they have implemented ARK inflections. For example, compare:
> >
> > http://texashistory.unt.edu/ark:/67531/metapth5828/
> > http://texashistory.unt.edu/ark:/67531/metapth5828/?
> > http://texashistory.unt.edu/ark:/67531/metapth5828/??
> >
> > In particular, the challenges posed by inflections are described in this
> > DC2014 paper [0] by Sébastien Peyrard and Jean-Philippe Tramoni from the
> > BNF and John A. Kunze from CDL.
> >
> > [0] http://dcpapers.dublincore.org/pubs/article/view/3704/1927
> >
> > Cheers,
> > Mark
> >
> >
> > --
> > Mark A. Matienzo 
> > Director of Technology, Digital Public Library of America
> >
> > On Fri, Dec 5, 2014 at 2:36 PM, Ethan Gruber  wrote:
> >
> > > I was recently reading the wikipedia article for Archival Resource Keys
> > > (ARKs, http://en.wikipedia.org/wiki/Archival_Resource_Key), and there
> > was
> > > a
> > > bit of functionality that a resource is supposed to deliver that we
> don't
> > > in our system, nor do any other systems that I've seen that implement
> ARK
> > > URIs.
> > >
> > > From the article:
> > >
> > > "An ARK contains the label *ark:* after the URL's hostname, which sets
> > the
> > > expectation that, when submitted to a web browser, the URL terminated
> by
> > > '?' returns a brief metadata record, and the URL terminated by '??'
> > returns
> > > metadata that includes a commitment statement from the current service
> > > provider."
> > >
> > > Looking at the official documentation (
> > > https://confluence.ucop.edu/display/Curation/ARK), they provided an
> > > example
> > > of http://ark.cdlib.org/ark:/13030/tf5p30086k? which is supposed to
> > return
> > > something called an Electronic Resource Citation, but it doesn't work.
> > > Probably because, and correct me if I'm wrong, using question marks in
> a
> > > URL in this way doesn't really work in HTTP.
> > >
> > > So, has anyone successfully implemented this? Is it even worth it? I'm
> > not
> > > sure I can even implement this in my own architecture.
> > >
> > > Maybe it would be better to recommend a standard set of request
> > parameters
> > > that actually work in REST?
> > >
> > > Ethan
> > >
> >
>
>
>
> --
> Tod Robbins
> Digital Asset Manager, MLIS
> todrobbins.com | @todrobbins <http://www.twitter.com/#!/todrobbins>
>

[CODE4LIB] Functional Archival Resource Keys

2014-12-05 Thread Ethan Gruber

I was recently reading the wikipedia article for Archival Resource Keys
(ARKs, http://en.wikipedia.org/wiki/Archival_Resource_Key), and there was a
bit of functionality that a resource is supposed to deliver that we don't
in our system, nor do any other systems that I've seen that implement ARK
URIs.

>From the article:

"An ARK contains the label *ark:* after the URL's hostname, which sets the
expectation that, when submitted to a web browser, the URL terminated by
'?' returns a brief metadata record, and the URL terminated by '??' returns
metadata that includes a commitment statement from the current service
provider."

Looking at the official documentation (
https://confluence.ucop.edu/display/Curation/ARK), they provided an example
of http://ark.cdlib.org/ark:/13030/tf5p30086k? which is supposed to return
something called an Electronic Resource Citation, but it doesn't work.
Probably because, and correct me if I'm wrong, using question marks in a
URL in this way doesn't really work in HTTP.

So, has anyone successfully implemented this? Is it even worth it? I'm not
sure I can even implement this in my own architecture.

Maybe it would be better to recommend a standard set of request parameters
that actually work in REST?

Ethan

Re: [CODE4LIB] Reconciling corporate names?

2014-09-26 Thread Ethan Gruber

I would check with the developers of SNAC (
http://socialarchive.iath.virginia.edu/), as they've spent a lot of time
developing named entity recognition scripts for personal and corporate
names. They might have something you can reuse.

Ethan

On Fri, Sep 26, 2014 at 3:47 PM, Galligan, Patrick 
wrote:

> I'm looking to reconcile about 40,000 corporate names against LCNAF to see
> whether they are authorized strings or not, but I'm drawing a blank about
> how to get it done.
>
> I've used http://freeyourmetadata.org/ for reconciling subject headings
> before, but I can't get it to work for LCNAF. Has anyone had any experience
> in a project like this? I'd love to hear some ideas for automatically
> dealing with a large data set like this that we did not create and do not
> know how the names were created.
>
> Thanks!
>
> -Patrick Galligan
>

[CODE4LIB] xEAC advanced beta / pre-production release ready for further testing

2014-08-29 Thread Ethan Gruber

Hi all,

xEAC (https://github.com/ewg118/xEAC), an open source, XForms-based
framework for the creation and publication of EAC-CPF records (for archival
authorities or scholarly prosopographies) is now ready for another round of
testing. While xEAC is still under development, it is essentially
production-ready for small-to-medium collections of authority records (less
than 100,000).

xEAC handles the majority of the elements in the EAC-CPF schema, with
particular focus on enhancing controlled vocabulary with external linked
open data systems and the semantic linking of relations between entities.
The following LOD lookup mechanisms are supported:

Geography: Geonames, LCNAF, Getty TGN, Pleiades Gazetteer of Ancient Places
Occupations/Functions: Getty AAT
Misc. linking and data import: VIAF, DBpedia, nomisma.org, and SNAC

xEAC supports transformation of EAC-CPF into a rudimentary form of three
different RDF models and posting data into an RDF triplestore by optionally
connecting the system to a SPARQL endpoint. Additionally, EADitor (
https://github.com/ewg118/eaditor), an open source framework for EAD
finding aid creation and publication can hook into a xEAC installation for
controlled vocabulary as well as posting to a triplestore, making it
possible to link archival authorities and content through LOD methodologies.

The recently released American Numismatic Society biographies (
http://numismatics.org/authorities/) and the new version of the archives (
http://numismatics.org/archives/) illustrate this architecture. For
example, the authority record for Edward T. Newell (
http://numismatics.org/authority/newell), contains a dynamically generated
list of archival resources (from a SPARQL query). This method is more
scalable and sustainable in the long run than using the EAC
resourceRelation element. Now that SPARQL has successfully been implemented
in xEAC, I will begin to integrate social network analysis interfaces into
the application.

More information:
Github repository: https://github.com/ewg118/xEAC
XForms for Archives, a blog detailing xEAC and EADitor development, as well
as linked data methodologies applied to archival collections:
http://eaditor.blogspot.com/
xEAC installation instructions: http://wiki.numismatics.org/xeac:xeac

Ethan Gruber
American Numismatic Society

Re: [CODE4LIB] Creating a Linked Data Service

2014-08-07 Thread Ethan Gruber

I agree with others saying linked data is overkill here. If you don't have
an audience in mind or a specific purpose for implementing linked data,
it's not worth it.


On Thu, Aug 7, 2014 at 9:07 AM, Jason Stirnaman  wrote:

> Mike,
> Check out
> http://json-ld.org/,
> http://json-ld.org/primer/latest/, and
> https://github.com/digitalbazaar/pyld
>
> But, if you haven't yet sketched out a model for *your* data, then the LD
> stuff will just be a distraction. The information on Linked Data seems
> overly complex because trying to represent data for the Semantic Web gets
> complex - and verbose.
>
> As others have suggested, it's never a bad idea to just "do the simplest
> thing that could possibly work."[1] Mark recommended writing a simple API.
> That would be a good start to understanding your data model and to
> eventually serving LD. And, you may find that it's enough for now.
>
> 1. http://www.xprogramming.com/Practices/PracSimplest.html
>
> Jason
>
> Jason Stirnaman
> Lead, Library Technology Services
> University of Kansas Medical Center
> jstirna...@kumc.edu
> 913-588-7319
>
> On Aug 6, 2014, at 1:45 PM, Michael Beccaria 
> wrote:
>
> > I have recently had the opportunity to create a new library web page and
> host it on my own servers. One of the elements of the new page that I want
> to improve upon is providing live or near live information on technology
> availability (10 of 12 laptops available, etc.). That data resides on my
> ILS server and I thought it might be a good time to upgrade the bubble gum
> and duct tape solution I now have to creating a real linked data service
> that would provide that availability information to the web server.
> >
> > The problem is there is a lot of overly complex and complicated
> information out there onlinked data and RDF and the semantic web etc. and
> I'm looking for a simple guide to creating a very simple linked data
> service with php or python or whatever. Does such a resource exist? Any
> advice on where to start?
> > Thanks,
> >
> > Mike Beccaria
> > Systems Librarian
> > Head of Digital Initiative
> > Paul Smith's College
> > 518.327.6376
> > mbecca...@paulsmiths.edu
> > Become a friend of Paul Smith's Library on Facebook today!
>

Re: [CODE4LIB] OAI Crosswalk XSLT

2014-07-11 Thread Ethan Gruber

The source model seems inordinately complex.


On Fri, Jul 11, 2014 at 10:53 AM, Matthew Sherman 
wrote:

> I guess it is the "doc:element/doc:element/doc:field" thing that is mostly
> what it throwing me.
>
>
> On Fri, Jul 11, 2014 at 10:52 AM, Dunn, Katie  wrote:
>
> > Hi Matt,
> >
> > The W3C Recommendation for XPath has some good explanation and examples
> > for abbreviated XPath syntax here: http://www.w3.org/TR/xpath-30/#abbrev
> >
> > Katie
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> > Matthew Sherman
> > Sent: Friday, July 11, 2014 10:39 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: [CODE4LIB] OAI Crosswalk XSLT
> >
> > Hi Code4Lib folks,
> >
> > I have a question for those of you who have worked with OAI-PMH.  I am
> > currently editing our DSpace OAI crosswalk to include a few custom
> metadata
> > field that exist in our repository for publication information and port
> > them into a more standard format.  The problem I am running into is the
> > select statements they use are not the typical XPath statements I am used
> > to.  For example:
> >
> >  >
> >
> select="doc:metadata/doc:element[@name='dc']/doc:element[@name='type']/doc:element/doc:element/doc:field[@name='value']">
> >  
> >
> > I know what the "." does, but the other select statement is a bit foreign
> > to me.  So my question is, does anyone know of some reference material
> that
> > can help me make sense of this select?  I need to understand what it is
> > doing so I can make my own.  Thanks for any insight you can provide.
> >
> > Matt Sherman
> >
>

Re: [CODE4LIB] New linked data set from North Carolina State University Libraries

2014-07-09 Thread Ethan Gruber

This is great. I appreciate that you've included XSLT to transform MADS
into other RDF serializations. I think that people will find this really
useful in other projects.

A minor point of contention, and it may start a debate, but I have been
successfully convinced that skos:exactMatch and not owl:sameAs is the
appropriate property to use in matching concepts in a SKOS-based thesaurus.
I don't think it matters that much in the long run, but it could have an
effect on semantic reasoning.

Ethan


On Wed, Jul 9, 2014 at 11:29 AM, Eric Hanson  wrote:

> NCSU Libraries has published its first linked data set, NCSU Organization
> Name Linked Data, at http://www.lib.ncsu.edu/ld/onld/. This data set is
> based on the NCSU Organization Name Authority, a tool maintained by the
> Acquisitions and Discovery department since 2009 to manage the variant
> forms of name for serial and e-resource publishers, providers, and vendors
> in E-Matrix, our locally-developed electronic resource management system.
>
> The names chosen as the authorized form reflect an acquisitions, rather
> than bibliographic, orientation. For example, in the Library of Congress
> Name Authority File, the Institute of Electrical and Electronics Engineers
> is represented by its full name, whereas in the NCSU Organization Name
> Linked Data, it appears as "IEEE, " which is how it is generally known
> among acquisitions staff.  Also, there many subsidiary units with valid
> headings in the LC Name Authority File but for the purpose of managing
> journals and electronic resources they are simply considered to be variant
> forms of name for the parent organization that manages acquisitions and
> licensing-related functions for the subsidiaries.
>
> The data in the NCSU Organization Name Linked Data is represented as RDF
> triples using properties from the SKOS, RDF Schema, FOAF and OWL
> vocabularies. Where possible, we included links to descriptions of the
> organizations in other linked data sources, including the Virtual
> International Authority File, the Library of Congress Name Authority File,
> Dbpedia, Freebase, and International Standard Name Identifier (ISNI). These
> types of links are encouraged in Tim Berners-Lee's description of 5 Star
> Open Data and will enable users of the data to easily incorporate
> properties from these other linked data sources in future applications.
>
> The data set is made freely available with the Creative Commons CC0
> License and can be downloaded as RDF-XML, N3/Turtle, N-Triples, JSON-LD or
> through RDFa embedded in the HTML page for each organization. We plan on
> periodically updating this data set with new organizations from our
> E-matrix system.
>
>  This data set will also be the seed data for organizations in the Global
> Open Knowledgebase (GOKb) (http://gokb.org/), a freely available data
> repository with key publication information about electronic resources that
> will have its public release in September.  As a part of NCSU’s lead role
> in the GOKb project, we are collaborating with the GOKb developers on
> future linked data initiatives involving title, package and platform data.
>
> For questions or reporting broken or incorrect links, please contact:
>
> Eric Hanson
> emhan...@ncsu.edu
> Electronic Resources Librarian
> Acquisitions & Discovery
> NCSU Libraries
>

[CODE4LIB] Fwd: [LAWDI] ISAW Papers 7 available

2014-07-09 Thread Ethan Gruber

This may interest some people: current state of linked open data within
classics/classical archaeology. These papers are from the NEH-funded Linked
Ancient World Data Institute, held at the Institute for the Study of the
Ancient World at NYU in 2012 and Drew University in 2013.

Ethan

-- Forwarded message --
From: Sebastian Heath 
Date: Tue, Jul 8, 2014 at 6:58 PM
Subject: [LAWDI] ISAW Papers 7 available
To: "la...@googlegroups.com" 

Greetings All,

 ISAW Papers 7 is available at

 http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/ .

 An important note: There is an update pending. The NYU library will
get to that very shortly so please don't worry if the latest edits you
sent me aren't visible at this moment. I think I'm completely caught
up.

 That link had started to circulate - I bear some responsibility for
that - and we've received queries as to the work's status, along with
positive responses. So let's call it "available" and tweet, cite, use,
etc. the content. That seems the LAWDI way.

 Second point: VIAF are IDs are still making their way through "the
system". I'll update as they become live.

 Many thanks to you all, and to repeat, tweeting, citing, linking from
academia.edu or similar are all highly encouraged.

 Best,

 Sebastian.

--
You received this message because you are subscribed to the Google Groups
"LAWDI" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lawdi+unsubscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/lawdi.
For more options, visit https://groups.google.com/d/optout.

[CODE4LIB] Archival linked open data: a discussion

2014-05-16 Thread Ethan Gruber

I understand that there is undoubtedly some overlap between this list and
LODLAM (Linked Open Data for Libraries, Archives, and Museums), but I
wanted to pass along a link to a discussion I started in the LODLAM list
about the application of RDF and linked data ontologies to archival
materials and authorities.

There are certainly some very knowledgeable LOD people on this list, and
therefore I don't want the discussion on LODLAM to slip through the cracks.
The application of linked data methodologies is tremendously important to
the archival community.

Here's the permalink to the thread in Google Groups:
https://groups.google.com/d/topic/lod-lam/sIrCqZPaZ8c/discussion

Ethan Gruber
American Numismatic Society

Re: [CODE4LIB] outside of libraryland,

2014-03-19 Thread Ethan Gruber

LODLAM, LAWDI (linked ancient world data institute/initiative), CAA
conference (computer applications in archaeology).
 On Mar 19, 2014 8:20 PM, "Coral Sheldon-Hess" 
wrote:

> A co-founded and co-host a learn-to-code workshop "for women and friends,"
> locally. (Men are welcomed as long as they are guests of female-identified
> participants.) Like Girl Develop It, but free--and we avoided the color
> pink.
>
> I'm also nominally on the planning committee for the local hackathon
> (though I mostly just show up at the event itself), and I show up at Code
> for Anchorage (Code for America) meetings at least once a year. :)
>
> I'm not sure if it counts as "belonging," per se, but I'm a lurker on the
> OpenHatch mailing list, and I participate in the Geek Feminism community.
> Until the organizer moved away, I went to local Raspberry Pi hack nights,
> every few weeks.
>
> Anchorage is small (300k people), so there's no Python Users Group or
> RailsBridge or anything like that, here. There's a Drupal Users Group, and
> I'm on their Meetup; we'll see if I ever show up, though. ;) I dropped our
> local Linux Users Group, because they're mostly just a mailing list for
> flamewars, nowadays; I don't even think they have meetings anymore. ...
> Which gets more at "lack of overlap" than "overlap," doesn't it?
>
> --
> Coral Sheldon-Hess
> http://sheldon-hess.org/coral
> @web_kunoichi
>
>
> On Fri, Mar 14, 2014 at 4:35 PM, Nate Hill 
> wrote:
>
> > what coding and technology groups do people on this list belong to and
> find
> > valuable?
> > I'm curious about how code4lib overlaps (or doesn't) with other domains.
> > thanks,
> > Nate
> >
> > --
> > Nate Hill
> > nathanielh...@gmail.com
> > http://4thfloor.chattlibrary.org/
> > http://www.natehill.net
> >
>

Re: [CODE4LIB] ArchivesSpace v1.0.7 Released [linked data]

2014-03-06 Thread Ethan Gruber

I think that RDFa provides the lowest barrier to entry. Using dcterms for
publisher, creator, title, etc. is a good place to start, and if your
collection (archival, library, museum) links to terms defined in LOD
vocabulary systems (LCSH, Getty, LCNAF, whatever), output these URIs in the
HTML interface and tag them in RDFa in such a way that they are
semantically meaningful, e.g., http://vocab.getty.edu/aat/300028569";
rel="dcterms:format">manuscripts (document genre)

It would be great if content management systems supported RDFa right out of
the box, and perhaps they are all moving in this direction. But you don't
need a content management system to do this. If you generate static HTML
files for your finding aids from EAD files using XSLT, you can tweak your
XSLT output to handle RDFa.

Ethan

On Thu, Mar 6, 2014 at 12:56 PM, Eric Lease Morgan  wrote:

> Let me ask a more direct question. If participating in linked data is a
> “good thing”, then how do you — anybody here — suggest archivists (or
> librarians or museum curators) do that starting today? —Eric Morgan
>

[CODE4LIB] xEAC, EAC-CPF publication framework, beta ready for testing

2014-03-06 Thread Ethan Gruber

xEAC is an open-source XForms-based application for creating and managing
EAC-CPF collections. The XForms backend allows editing of the XML documents
in a web form, and relationships between source and target entities are
maintained automatically. It is available at https://github.com/ewg118/xEAC.

I have finally gotten xEAC to a stage where I feel it is ready for wider
testing (and I have updated the installation documentation). This has been
a few months coming, since I had intended to release the beta shortly after
MARAC in November. The xEAC documentation can be found here:
http://wiki.numismatics.org/xeac:xeac

Features

-Create, edit, publish EAC-CPF documents. Most, but not all, EAC-CPF
elements are supported.
-Public user interface migrated to bootstrap 3 to support mobile devices.
-Maps and timelines for visualization of life events.
-Basic faceted search and Solr-based Atom feed in the UI.
-Export in EAC-CPF, KML, and rudimentary RDF/XML. HTML5+RDFa available in
entity record pages.
-Manage semantic relationships between identities (
http://eaditor.blogspot.com/2013/11/maintaining-relationships-in-eac-cpf.html).
Target records are automatically updated with symmetrical or inverse
relationships, where relevant, and relationships are expressed in the RDF
output. TODO: parse relationship ontologies defined in RDF (e.g.,
http://vocab.org/relationship/.rdf) for use in xEAC.

REST interactions

The XForms engine interacts with the following web services to import name
authorities, biographical, or geographic information:

-VIAF lookup
-DBPedia import
-Geonames for modern places (placeEntry element)
-Pleiades Gazetteer of Ancient Places (placeEntry)
-Getty AAT SPARQL (occupation element) (
http://eaditor.blogspot.com/2014/03/linking-eac-cpf-occupations-to-getty-aat.html
)
-SPARQL query mechanism of nomisma.org in the UI (and extensible,
generalizable lookup widgets)

When the OCLC linked data service supports queries by VIAF URI, I will
create a lookup widget to provide lists of related bibliographic resources.

TODO list

I aim to improve xEAC over the following months and incorporate the
following:

-Finish form: Represent all EAC-CPF elements and attributes
-Test for scalability
-Interface with more APIs in the editing interface
-Improve public interface, especially searching and browsing
-Employ SPARQL endpoint for more sophisticated querying and visualization,
automatically publish to SPARQL on EAC-CPF record save.
-Incorporate social network graph visualization (see SPARQL, above)
-Follow evolving best practices in RDF, support export in TEI for
prosopographies (http://wiki.tei-c.org/index.php/Prosopography) and
CIDOC-CRM.
-Interact with SNAC or international entity databases which evolve from it.

Resources:
Blog: http://eaditor.blogspot.com/
MARAC slideshow:
http://eaditor.blogspot.com/2013/11/marac-fall-2013-presentation.html
Prototype site: http://admin.numismatics.org/xeac/

Re: [CODE4LIB] ArchivesSpace v1.0.7 Released [linked data]

2014-03-06 Thread Ethan Gruber

The issue here that I see is that D2RQ will expose the MySQL database
structure as linked data in some sort of indecipherable ontology and the
end result is probably useless. What Mark alludes to here is that the
developers of ArchivesSpace could write scripts, inherent to the platform,
that could output linked data that conforms to existing or emerging
standards. This is much simpler than introducing D2RQ into the application
layer, and allows for greater control of the export models. As a developer
of different, potentially competing, software applications for EAD and
EAC-CPF publication, who is to say that ArchivesSpace database field names
should be "standards" or "best practices?" These are things that should be
determined by the archival community, not a software application.

CIDOC-CRM is capable of representing the structure and relationships
between components of an archival collection. I'm not a huge advocate of
the CRM because I think it has a tendency to be inordinately complex, but
*it* is a standard. Therefore, if the archival community decided that it
would adopt CRM as its RDF data model standard, ArchivesSpace, ICA-AtoM,
EADitor, and other archival management/description systems could adapt to
the needs of the community and offer content in these models.

Ethan

On Thu, Mar 6, 2014 at 10:41 AM, Eric Lease Morgan  wrote:

> On Mar 6, 2014, at 9:47 AM, Mark A. Matienzo 
> wrote:
>
> > ArchivesSpace has a REST backend API, and requests yield a response in
> > JSON. As one option, I'd investigate to publish linked data as JSON-LD.
> > Some degree of mapping would be necessary, but I imagine it would be
> > significantly easier to that instead of using something like D2RQ.
>
>
> If I understand things correctly, using D2RQ to publish database contents
> as linked data is mostly a systems administration task:
>
>   1. download and install D2RQ
>   2. run D2RQ-specific script to read a (ArchiveSpace) database schema and
> create a configuration file
>   3. run D2RQ with the configuration file
>   4. provide access via standard linked data publishing methods
>   5. done
>
> If the field names in the initial database are meaningful, and if the
> database schema is normalized, then D2RQ ought to work pretty well. If many
> archives use ArchiveSpace, then the field names can become “standard” or at
> least “best practices”, and the resulting RDF will be well linked.
>
> I have downloaded and run ArchiveSpace on my desktop computer. It imported
> some of my EAD files pretty well. It created EAC-CPF files from my names.
> Fun. I didn’t see a way to export things as EAD. The whole interface is
> beautiful and functional. In my copious spare time I will see about
> configuring ArchiveSpace to use a MySQL backend (instead of the embedded
> database), and see about putting D2RQ on top. I think this will be easier
> than learning a new API and building an entire linked data publishing
> system. D2RQ may be an viable option with the understanding that no
> solution is perfect.
>
> —
> Eric Morgan
>

Re: [CODE4LIB] getty thesaurus, linked data, and sparql

2014-02-21 Thread Ethan Gruber

You have to have some idea of what ontologies are being used and how they
are used. The SPARQL endpoint gives you a list of the prefixes to prepend,
but you still have to know what they are. The best way to learn about the
structure of the data is to browse around. There aren't many example
queries, and unfortunately, SPARQL requires knowledge of the underlying RDF
structure in order to query effectively. This thesaurus data isn't terribly
complex, so it's not nearly as complicated as putting together useful
SPARQL queries on CIDOC-CRM in the British Museum endpoint. As a tip, what
I learned from one of the Ontotext developers is that there's a luc:term
predicate you can employ for Lucene-based text searches. i.e.:

PREFIX gvp: 
PREFIX rdf: 
PREFIX skos: 
SELECT ?c ?label WHERE {
?c rdf:type gvp:Concept .
?c skos:prefLabel ?label .
?c skos:inScheme  .
?c luc:term "skyphos" .
FILTER langMatches(lang(?label), "en") .
}
ORDER BY ASC(?label)
LIMIT 25

As a tip, you might want to restrict with inScheme. The only terms in are
from AAT, but you'll need to distinguish once they've loaded TGN, ULAN, etc.

Now that the Getty vocabularies are officially public, I am hoping that
other software developers will create widgets to enhance description of
materials. I've already integrating AAT SPARQL lookups into EADitor (
http://eaditor.blogspot.com/2014/02/integrating-eaditor-with-getty-linked.html),
so hopefully other LAM collection management systems will begin linking
more directly to AAT, which will, in turn, make it easier to aggregate and
organize data (in DPLA, for example). I'm working on linked data projects
in the fields of numismatics and Greek pottery, and we've begun creating
concordances between the Getty, British Museum, and other thesauri.
Hopefully this opens the door to more widespread collaboration in cultural
heritage.

Ethan

On Fri, Feb 21, 2014 at 11:51 AM, Eric Lease Morgan  wrote:

> > Today the Getty released the Art and Architecture Thesaurus as Linked
> Open Data [1].
>
> Releasing the Getty Thesaurus as linked data is very interesting, and
> after visiting the blog posting I discovered a SPARQL endpoint to the data.
> [2] Yet, I seem to always have problems exploring SPARQL endpoints without
> having an in-depth and thorough knowledge of the underlying ontologies. Is
> this just me, or am I missing something?
>
> For example, without knowing anything, I think I can submit a SPARQL query
> such as the following to just about any SPARQL endpoint to get an overview
> of the triple store’s ontologies:
>
>  SELECT DISTINCT ?class
>  WHERE { ?subject a ?class }
>  ORDER BY ?class
>
> This query uses the SPARQL short-hand notation of “a” to denote the RDF
> predicate equal to rdf:typeOf, which I assume will be in just about every
> triple store. Correct? Applying this query to the Getty SPARQL endpoint
> returns a list of (hopefully) actionable URIs describing all the ontologies
> used in the triple store.
>
> I can submit the following SPARQL query to just about any triple store to
> get a list of all the predicates used in the triple store, but the query
> usually never returns; it probably creates a heck of a lot of work on the
> endpoint’s backend. Each one of these predicates ought to be described in
> greater detail in the actionable URIs from Query #1. Correct?
>
>  SELECT DISTINCT ?property
>  WHERE { ?subject ?property ?object }
>  ORDER BY ?property
>
> Given these ontologies (classes) and properties (relationships), I ought
> to be able to navigate around the triple store discovering cool
> information, but I find the process to be very difficult. Here are a few
> queries:
>
>  # list of concepts
>  SELECT *
>  WHERE { ?s a  }
>
>  # all about the English phrase founding tools
>  SELECT *
>  WHERE { ?s ?p "founding tools"@en }
>
>  # uri for founding tools
>  SELECT ?uri
>  WHERE { ?uri rdfs:label "founding tools"@en }
>
> I find this process to be painful. To what degree am I still to much a
> novice at SPARQL, and to what degree do I need to have an intimate
> knowledge of the ontologies before I can create meaningful queries? To what
> degree do more user-friendly front-ends need to be created? In order for
> URIs to replace literals in RDF, there will need to be much easier to use
> interfaces to triple stores. Correct? Like the need for a data dictionary
> and entity-relationship diagram in searching of relational databases vis
> SQL, to what degree do I really need to know and understand the supporting
> ontologies before I can make meaningful sense of a triple store?
>
> Put another way, is there some set of basic/rudimentary queries I can send
> to SPARQL endpoints, get results, and begin to drill down without really
> knowing the ontologies? I’m stymied in this regard.
>
>
> [1] announcement -
>

Re: [CODE4LIB] links from finding aid to digital object

2014-01-15 Thread Ethan Gruber

You could also try the EAD list if you need more examples.
On Jan 15, 2014 8:45 AM, "Edward Summers"  wrote:

> Thanks for all the responses about linking finding aids to digital objects
> yesterday — it was very helpful! I haven’t done much work (yet) looking to
> see what the patterns are. But a few people contacted me asking me to
> provide the results. so I have pulled out the examples into a document
> that’s up on Github:
>
> https://github.com/edsu/eadlinks
>
> If you don’t want your name/email listed let me know. I thought it might
> be helpful for anyone that wanted to follow up.
>
> //Ed
>

Re: [CODE4LIB] linked data recipe

2013-11-19 Thread Ethan Gruber

Hasn't the pendulum swung back toward RDFa Lite (
http://www.w3.org/TR/rdfa-lite/) recently?  They are fairly equivalent, but
I'm not sure about all the politics involved.


On Tue, Nov 19, 2013 at 11:09 AM, Karen Coyle  wrote:

> Eric, if you want to leap into the linked data world in the fastest,
> easiest way possible, then I suggest looking at microdata markup, e.g.
> schema.org.[1] Schema.org does not require you to transform your data at
> all: it only requires mark-up of your online displays. This makes sense
> because as long as your data is in local databases, it's not visible to the
> linked data universe anyway; so why not take the easy way out and just add
> linked data to your public online displays? This doesn't require a
> transformation of your entire record (some of which may not be suitable as
> linked data in any case), only those "things" that are likely to link
> usefully. This latter generally means "things for which you have an
> identifier." And you make no changes to your database, only to display.
>
> OCLC is already producing this markup in WorldCat records [2]-- not
> perfectly, of course, lots of warts, but it is a first step. However, it is
> a first step that makes more sense to me than *transforming* or
> *cross-walking* current metadata. It also, I believe, will help us
> understand what bits of our current metadata will make the transition to
> linked data, and what bits should remain as accessible documents that users
> can reach through linked data.
>
> kc
> [1] http://schema.org, and look at the work going on to add bibliographic
> properties at http://www.w3.org/community/schemabibex/wiki/Main_Page
> [2] look at the "linked data" section of any WorldCat page for a single
> item, such ashttp://www.worldcat.org/title/selection-of-early-
> statistical-papers-of-j-neyman/oclc/527725&referer=brief_results
>
>
>
>
> On 11/19/13 7:54 AM, Eric Lease Morgan wrote:
>
>> On Nov 19, 2013, at 9:41 AM, Karen Coyle  wrote:
>>
>>  Eric, I think this skips a step - which is the design step in which you
>>> create a domain model that uses linked data as its basis. RDF is not a
>>> serialization; it actually may require you to re-think the basic
>>> structure of your metadata. The reason for that is that it provides
>>> capabilities that record-based data models do not. Rather than starting
>>> with current metadata, you need to take a step back and ask: what does
>>> my information world look like as linked data?
>>>
>>
>> I respectfully disagree. I do not think it necessary to create a domain
>> model ahead of time; I do not think it is necessary for us to re-think our
>> metadata structures. There already exists tools enabling us — cultural
>> heritage institutions — to manifest our metadata as RDF. The manifestations
>> may not be perfect, but “we need to learn to walk before we run” and the
>> metadata structures we have right now will work for right now. As we mature
>> we can refine our processes. I do not advocate “stepping back and asking”.
>> I advocate looking forward and doing. —Eric Morgan
>>
>
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet
>

Re: [CODE4LIB] linked data recipe

2013-11-19 Thread Ethan Gruber

yo, i get it


On Tue, Nov 19, 2013 at 10:54 AM, Ross Singer  wrote:

> I don't know what your definition of "serialization" is, but I don't know
> of any where "data model" and "formatted output of a data model" are
> synonymous.
>
> RDF is a data model *not* a serialization.
>
> -Ross.
>
>
> On Tue, Nov 19, 2013 at 10:45 AM, Ethan Gruber  wrote:
>
> > I see that serialization has a different definition in computer science
> > than I thought it did.
> >
> >
> > On Tue, Nov 19, 2013 at 10:36 AM, Ross Singer 
> > wrote:
> >
> > > That's still not a "serialization".  It's just a similar data model.
> > >  Pretty huge difference.
> > >
> > > -Ross.
> > >
> > >
> > > On Tue, Nov 19, 2013 at 10:31 AM, Ethan Gruber 
> > wrote:
> > >
> > > > I'm not sure that I agree that RDF is not a serialization.  It really
> > > > depends on the context of the system and intended use of the linked
> > data.
> > > > For example, TEI is designed with a specific purpose which cannot be
> > > > replicated in RDF (at least, not very easily at all), but deriving
> RDF
> > > from
> > > > highly-linked TEI to put into an endpoint can open doors to queries
> > which
> > > > are otherwise impossible to make on the data.  This certainly
> requires
> > > some
> > > > rethinking of the way texts interact.  But perhaps it may be best to
> > say
> > > > that RDF *can* (but not necessarily) be a derivation, rather than
> > > > serialization, of some larger, more complex canonical data model.
> > > >
> > > > Ethan
> > > >
> > > >
> > > > On Tue, Nov 19, 2013 at 9:54 AM, Aaron Rubinstein <
> > > > arubi...@library.umass.edu> wrote:
> > > >
> > > > > I think you’ve hit the nail on the head here, Karen. I would just
> > add,
> > > or
> > > > > maybe reassure, that this does not necessarily require rethinking
> > your
> > > > > existing metadata but how to translate that existing metadata into
> a
> > > > linked
> > > > > data environment. Though this might seem like a pain, in many cases
> > it
> > > > will
> > > > > actually inspire you to go back and improve/increase the value of
> > that
> > > > > existing metadata.
> > > > >
> > > > > This is definitely looking awesome, Eric!
> > > > >
> > > > > Aaron
> > > > >
> > > > > On Nov 19, 2013, at 9:41 AM, Karen Coyle  wrote:
> > > > >
> > > > > > Eric, I think this skips a step - which is the design step in
> which
> > > you
> > > > > create a domain model that uses linked data as its basis. RDF is
> not
> > a
> > > > > serialization; it actually may require you to re-think the basic
> > > > structure
> > > > > of your metadata. The reason for that is that it provides
> > capabilities
> > > > that
> > > > > record-based data models do not. Rather than starting with current
> > > > > metadata, you need to take a step back and ask: what does my
> > > information
> > > > > world look like as linked data?
> > > > > >
> > > > > > I repeat: RDF is NOT A SERIALIZATION.
> > > > > >
> > > > > > kc
> > > > > >
> > > > > > On 11/19/13 5:04 AM, Eric Lease Morgan wrote:
> > > > > >> I believe participating in the Semantic Web and providing
> content
> > > via
> > > > > the principles of linked data is not "rocket surgery", especially
> for
> > > > > cultural heritage institutions -- libraries, archives, and museums.
> > > Here
> > > > is
> > > > > a simple recipe for their participation:
> > > > > >>
> > > > > >>   1. use existing metadata standards (MARC, EAD, etc.) to
> describe
> > > > > >>  collections
> > > > > >>
> > > > > >>   2. use any number of existing tools to convert the metadata to
> > > > > >>  HTML, and save the HTML on a Web server
> > > > > >>
> > > > > >>   3. use any number of existing tools to convert the meta

Re: [CODE4LIB] linked data recipe

2013-11-19 Thread Ethan Gruber

I see that serialization has a different definition in computer science
than I thought it did.


On Tue, Nov 19, 2013 at 10:36 AM, Ross Singer  wrote:

> That's still not a "serialization".  It's just a similar data model.
>  Pretty huge difference.
>
> -Ross.
>
>
> On Tue, Nov 19, 2013 at 10:31 AM, Ethan Gruber  wrote:
>
> > I'm not sure that I agree that RDF is not a serialization.  It really
> > depends on the context of the system and intended use of the linked data.
> > For example, TEI is designed with a specific purpose which cannot be
> > replicated in RDF (at least, not very easily at all), but deriving RDF
> from
> > highly-linked TEI to put into an endpoint can open doors to queries which
> > are otherwise impossible to make on the data.  This certainly requires
> some
> > rethinking of the way texts interact.  But perhaps it may be best to say
> > that RDF *can* (but not necessarily) be a derivation, rather than
> > serialization, of some larger, more complex canonical data model.
> >
> > Ethan
> >
> >
> > On Tue, Nov 19, 2013 at 9:54 AM, Aaron Rubinstein <
> > arubi...@library.umass.edu> wrote:
> >
> > > I think you’ve hit the nail on the head here, Karen. I would just add,
> or
> > > maybe reassure, that this does not necessarily require rethinking your
> > > existing metadata but how to translate that existing metadata into a
> > linked
> > > data environment. Though this might seem like a pain, in many cases it
> > will
> > > actually inspire you to go back and improve/increase the value of that
> > > existing metadata.
> > >
> > > This is definitely looking awesome, Eric!
> > >
> > > Aaron
> > >
> > > On Nov 19, 2013, at 9:41 AM, Karen Coyle  wrote:
> > >
> > > > Eric, I think this skips a step - which is the design step in which
> you
> > > create a domain model that uses linked data as its basis. RDF is not a
> > > serialization; it actually may require you to re-think the basic
> > structure
> > > of your metadata. The reason for that is that it provides capabilities
> > that
> > > record-based data models do not. Rather than starting with current
> > > metadata, you need to take a step back and ask: what does my
> information
> > > world look like as linked data?
> > > >
> > > > I repeat: RDF is NOT A SERIALIZATION.
> > > >
> > > > kc
> > > >
> > > > On 11/19/13 5:04 AM, Eric Lease Morgan wrote:
> > > >> I believe participating in the Semantic Web and providing content
> via
> > > the principles of linked data is not "rocket surgery", especially for
> > > cultural heritage institutions -- libraries, archives, and museums.
> Here
> > is
> > > a simple recipe for their participation:
> > > >>
> > > >>   1. use existing metadata standards (MARC, EAD, etc.) to describe
> > > >>  collections
> > > >>
> > > >>   2. use any number of existing tools to convert the metadata to
> > > >>  HTML, and save the HTML on a Web server
> > > >>
> > > >>   3. use any number of existing tools to convert the metadata to
> > > >>  RDF/XML (or some other "serialization" of RDF), and save the
> > > >>  RDF/XML on a Web server
> > > >>
> > > >>   4. rest, congratulate yourself, and share your experience with
> > > >>  others in your domain
> > > >>
> > > >>   5. after the first time though, go back to Step #1, but this time
> > > >>  work with other people inside your domain making sure you use
> as
> > > >>  many of the same URIs as possible
> > > >>
> > > >>   6. after the second time through, go back to Step #1, but this
> > > >>  time supplement access to your linked data with a triple store,
> > > >>  thus supporting search
> > > >>
> > > >>   7. after the third time through, go back to Step #1, but this
> > > >>  time use any number of existing tools to expose the content in
> > > >>  your other information systems (relational databases, OAI-PMH
> > > >>  data repositories, etc.)
> > > >>
> > > >>   8. for dessert, cogitate ways to exploit the linked data in your
> > > >>  domain to discover new and additional relationships between
> URIs,
> > > >>  and thus make the Semantic Web more of a reality
> > > >>
> > > >> What do you think?
> > > >>
> > > >> I am in the process of writing a guidebook on the topic of linked
> data
> > > and archives. In the guidebook I will elaborate on this recipe and
> > provide
> > > instructions for its implementation. [1]
> > > >>
> > > >> [1] guidebook - http://sites.tufts.edu/liam/
> > > >>
> > > >> --
> > > >> Eric Lease Morgan
> > > >> University of Notre Dame
> > > >
> > > > --
> > > > Karen Coyle
> > > > kco...@kcoyle.net http://kcoyle.net
> > > > m: 1-510-435-8234
> > > > skype: kcoylenet
> > >
> >
>

Re: [CODE4LIB] linked data recipe

2013-11-19 Thread Ethan Gruber

I'm not sure that I agree that RDF is not a serialization.  It really
depends on the context of the system and intended use of the linked data.
For example, TEI is designed with a specific purpose which cannot be
replicated in RDF (at least, not very easily at all), but deriving RDF from
highly-linked TEI to put into an endpoint can open doors to queries which
are otherwise impossible to make on the data.  This certainly requires some
rethinking of the way texts interact.  But perhaps it may be best to say
that RDF *can* (but not necessarily) be a derivation, rather than
serialization, of some larger, more complex canonical data model.

Ethan


On Tue, Nov 19, 2013 at 9:54 AM, Aaron Rubinstein <
arubi...@library.umass.edu> wrote:

> I think you’ve hit the nail on the head here, Karen. I would just add, or
> maybe reassure, that this does not necessarily require rethinking your
> existing metadata but how to translate that existing metadata into a linked
> data environment. Though this might seem like a pain, in many cases it will
> actually inspire you to go back and improve/increase the value of that
> existing metadata.
>
> This is definitely looking awesome, Eric!
>
> Aaron
>
> On Nov 19, 2013, at 9:41 AM, Karen Coyle  wrote:
>
> > Eric, I think this skips a step - which is the design step in which you
> create a domain model that uses linked data as its basis. RDF is not a
> serialization; it actually may require you to re-think the basic structure
> of your metadata. The reason for that is that it provides capabilities that
> record-based data models do not. Rather than starting with current
> metadata, you need to take a step back and ask: what does my information
> world look like as linked data?
> >
> > I repeat: RDF is NOT A SERIALIZATION.
> >
> > kc
> >
> > On 11/19/13 5:04 AM, Eric Lease Morgan wrote:
> >> I believe participating in the Semantic Web and providing content via
> the principles of linked data is not "rocket surgery", especially for
> cultural heritage institutions -- libraries, archives, and museums. Here is
> a simple recipe for their participation:
> >>
> >>   1. use existing metadata standards (MARC, EAD, etc.) to describe
> >>  collections
> >>
> >>   2. use any number of existing tools to convert the metadata to
> >>  HTML, and save the HTML on a Web server
> >>
> >>   3. use any number of existing tools to convert the metadata to
> >>  RDF/XML (or some other "serialization" of RDF), and save the
> >>  RDF/XML on a Web server
> >>
> >>   4. rest, congratulate yourself, and share your experience with
> >>  others in your domain
> >>
> >>   5. after the first time though, go back to Step #1, but this time
> >>  work with other people inside your domain making sure you use as
> >>  many of the same URIs as possible
> >>
> >>   6. after the second time through, go back to Step #1, but this
> >>  time supplement access to your linked data with a triple store,
> >>  thus supporting search
> >>
> >>   7. after the third time through, go back to Step #1, but this
> >>  time use any number of existing tools to expose the content in
> >>  your other information systems (relational databases, OAI-PMH
> >>  data repositories, etc.)
> >>
> >>   8. for dessert, cogitate ways to exploit the linked data in your
> >>  domain to discover new and additional relationships between URIs,
> >>  and thus make the Semantic Web more of a reality
> >>
> >> What do you think?
> >>
> >> I am in the process of writing a guidebook on the topic of linked data
> and archives. In the guidebook I will elaborate on this recipe and provide
> instructions for its implementation. [1]
> >>
> >> [1] guidebook - http://sites.tufts.edu/liam/
> >>
> >> --
> >> Eric Lease Morgan
> >> University of Notre Dame
> >
> > --
> > Karen Coyle
> > kco...@kcoyle.net http://kcoyle.net
> > m: 1-510-435-8234
> > skype: kcoylenet
>

Re: [CODE4LIB] Charlotte, NC Code4Lib Meeting

2013-11-14 Thread Ethan Gruber

Asheville +1


On Thu, Nov 14, 2013 at 4:20 PM, Simon Spero  wrote:

> Anyone thought about doing a code4lib in Asheville?
> What about Raleigh?
> :-P
> On Nov 12, 2013 8:42 PM, "Kevin S. Clarke"  wrote:
>
> > I'd be interested. I'm in Boone... not too far a drive. :)
> >
> > Kevin
> > On Nov 12, 2013 6:35 PM, "Riley Childs"  wrote:
> >
> > > Is anyone in Charlotte, NC (and surrounding areas) interested in
> > starting a
> > > Code4Lib meeting?
> > > Just kind of asking :{D!
> > > *Riley Childs*
> > > *Library Technology Manager at Charlotte United Christian Academy
> > > *
> > > *Head Programmer/Manager at Open Library Management Projec
> > > t *
> > > *Cisco Certified Entry Network Technician *
> > > _
> > >
> > > *Phone: +1 (704) 497-2086*
> > > *email: ri...@tfsgeo.com *
> > > *Twitter: @RowdyChildren *
> > >
> >
>

Re: [CODE4LIB] Charlotte, NC Code4Lib Meeting

2013-11-12 Thread Ethan Gruber

I'm in Virginia and might attend said meeting, even if I can't help
organize.
On Nov 12, 2013 6:35 PM, "Riley Childs"  wrote:

> Is anyone in Charlotte, NC (and surrounding areas) interested in starting a
> Code4Lib meeting?
> Just kind of asking :{D!
> *Riley Childs*
> *Library Technology Manager at Charlotte United Christian Academy
> *
> *Head Programmer/Manager at Open Library Management Projec
> t *
> *Cisco Certified Entry Network Technician *
> _
>
> *Phone: +1 (704) 497-2086*
> *email: ri...@tfsgeo.com *
> *Twitter: @RowdyChildren *
>

Re: [CODE4LIB] rdf triplestores

2013-11-11 Thread Ethan Gruber

I've been using Apache Fuseki (
http://jena.apache.org/documentation/serving_data/) for almost a year, in
production since the spring.  It's a SPARQL server with a built in TBD.
It's easy to use, and takes about 5 minutes to get working on your desktop
or server.

Ethan


On Mon, Nov 11, 2013 at 1:17 AM, Richard Wallis <
richard.wal...@dataliberate.com> wrote:

> I've had some success with 4Store: http://4store.org
>
> Used it on mac laptop to load the WorldCat most highly held resources:
> http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/
>
> As to the point about loading RDF/XML, especially if you have a large
> amount of data.
>
>- Triplestores much prefer raw triples for large amounts of data
>- Chopping up files of triples into smaller chunks is also often
>beneficial as it reduces memory footprints and can take advantage of
>multithreading.  It is also far easier to recover from errors such as
> bad
>data etc.
>- A bit of unix command line wizardry (split followed a simple for-loop)
>is fairly standard practice
>
> Also raw triples are often easier to produce - none of that mucking about
> producing correctly formatted XML - and you can chop, sort, and play about
> with them using powerful unix command line tools.
>
> ~Richard.
>
>
> On 11 November 2013 18:19, Scott Turnbull  >wrote:
>
> > I've primarily used Sesame myself.  The http based queries made it pretty
> > easy to script against.
> >
> > http://www.openrdf.org/
> >
> >
> > On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan 
> > wrote:
> >
> > > What is your favorite RDF triplestore?
> > >
> > > I am able to convert numerous library-related metadata formats into
> > > RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
> > > simply putting the resulting files on an HTTP file system. But if I
> were
> > to
> > > import my RDF/XML into a triplestore, then I could do a lot more. Jena
> > > seems like a good option. So does Openlink Virtuoso.
> > >
> > > What experience do y'all have with these tools, and do you know how to
> > > import RDF/XML into them?
> > >
> > > --
> > > Eric Lease Morgan
> > >
> >
> >
> >
> > --
> > *Scott Turnbull*
> > APTrust Technical Lead
> > scott.turnb...@aptrust.org
> > www.aptrust.org
> > 678-379-9488
> >
>
>
>
> --
> Richard Wallis
> Founder, Data Liberate
> http://dataliberate.com
> Tel: +44 (0)7767 886 005
>
> Linkedin: http://www.linkedin.com/in/richardwallis
> Skype: richard.wallis1
> Twitter: @rjw
>

Re: [CODE4LIB] mass convert jpeg to pdf

2013-11-10 Thread Ethan Gruber

Does anyone have experience with an image zooming engine in conjunction
with image annotation? I don't want end users to annotate things
themselves, but allow them to click on annotations added by an archivist.

Thanks,
Ethan
On Nov 8, 2013 4:39 PM, "Edward Summers"  wrote:

> I’m having trouble understanding who the user of this content you are
> putting into Omeka is, and what you are expecting them to do with it. But,
> ok …
>
> //Ed
>
> On Nov 8, 2013, at 4:22 PM, Kyle Banerjee  wrote:
>
> >> It is sad to me that converting to PDF for viewing off the Web seems
> like
> >> the answer. Isn’t there a tiling viewer (like Leaflet) that could be
> used
> >> to render jpeg derivatives of the original tif files in Omeka?
> >>
> >>
> > This should be pretty easy. But the issue with tiling is that the nav
> > process is miserable for all but the shortest books. Most of the people
> who
> > want to download want are looking for jpegs rather than source tiffs and
> > one pdf instead of a bunch of tiffs (which is good since each one is
> > typically over 100MB). Of course there are people who want the real deal,
> > but that's actually a much less common use case.
> >
> > As Karen observes, downloading and viewing serve different use cases so
> of
> > course we will provide both. IIP Image Server looks intriguing. But most
> of
> > our users who want the full res stuff really just want to download the
> > source tiffs which will be made available.
> >
> > kyle
>

Re: [CODE4LIB] mass convert jpeg to pdf

2013-11-08 Thread Ethan Gruber

On the same note, I've had good experiences with using adore djatoka to
render jpeg2000 files. Maybe something better has since come along. I'm out
of touch with this type of technology.
On Nov 8, 2013 2:10 PM, "Edward Summers"  wrote:

> It is sad to me that converting to PDF for viewing off the Web seems like
> the answer. Isn’t there a tiling viewer (like Leaflet) that could be used
> to render jpeg derivatives of the original tif files in Omeka?
>
> For an example of using Leaflet (usually used for working with maps) in
> this way checkout NYTimes Machine Beta:
>
> http://apps.beta620.nytimes.com/timesmachine/1969/07/20/issue.html
>
> //Ed
>
> On Nov 8, 2013, at 2:00 PM, Kyle Banerjee  wrote:
>
> > We are in the process of migrating our digital collections from CONTENTdm
> > to Omeka and are trying to figure out what to do about the compound
> objects
> > -- the vast majority of which are digitized books.
> >
> > The source files are actually hi res tiffs but since ginormous objects
> > broken into hundreds of pieces (each of which can be well over 100MB in
> > size) aren't exactly friendly to use, we'd like to stitch them into
> > individual pdf's that can be viewed more conveniently
> >
> > My game plan is to simply have a script pull the files down as jpegs
> which
> > can be fed to imagemagick which can theoretically do everything I need.
> > However, I've never actually done anything like this before, so I wanted
> to
> > see if there's a method that people have used for combining lots of
> images
> > into pdfs that works particularly well. Thanks,
> >
> > kyle
>

Re: [CODE4LIB] mass convert jpeg to pdf

2013-11-08 Thread Ethan Gruber

I've done something like this in imagemagick, and it worked quite well, so
I can vouch for this workflow.  But just to clarify, I presume you will be
creating static PDF files to place in the filesystem--not generate a PDF
dynamically through Omeka when a user clicks to download a PDF (as in,
Omeka files off an imagemagick process).

Ethan
On Nov 8, 2013 2:00 PM, "Kyle Banerjee"  wrote:

> We are in the process of migrating our digital collections from CONTENTdm
> to Omeka and are trying to figure out what to do about the compound objects
> -- the vast majority of which are digitized books.
>
> The source files are actually hi res tiffs but since ginormous objects
> broken into hundreds of pieces (each of which can be well over 100MB in
> size) aren't exactly friendly to use, we'd like to stitch them into
> individual pdf's that can be viewed more conveniently
>
> My game plan is to simply have a script pull the files down as jpegs which
> can be fed to imagemagick which can theoretically do everything I need.
> However, I've never actually done anything like this before, so I wanted to
> see if there's a method that people have used for combining lots of images
> into pdfs that works particularly well. Thanks,
>
> kyle
>

Re: [CODE4LIB] rdf serialization

2013-11-06 Thread Ethan Gruber

I think that the answer to #1 is that if you want or expect people to use
your endpoint that you should document how it works: the ontologies, the
models, and a variety of example SPARQL queries, ranging from simple to
complex.  The British Museum's SPARQL endpoint (
http://collection.britishmuseum.org/sparql) is highly touted, but how many
people actually use it?  I understand your point about SPARQL being too
complicated for an API interface, but the best examples of services built
on SPARQL are probably the ones you don't even realize are built on SPARQL
(e.g., http://numismatics.org/ocre/id/ric.1%282%29.aug.4A#mapTab).  So on
one hand, perhaps only the most dedicated and hardcore researchers will
venture to construct SPARQL queries for your endpoint, but on the other,
you can build some pretty visualizations based on SPARQL queries conducted
in the background from the user's interaction with a simple html/javascript
based interface.

Ethan


On Wed, Nov 6, 2013 at 11:54 AM, Ross Singer  wrote:

> Hey Karen,
>
> It's purely anecdotal (albeit anecdotes borne from working at a company
> that offered, and has since abandoned, a sparql-based triple store
> service), but I just don't see the interest in arbitrary SPARQL queries
> against remote datasets that I do against linking to (and grabbing) known
> items.  I think there are multiple reasons for this:
>
> 1) Unless you're already familiar with the dataset behind the SPARQL
> endpoint, where do you even start with constructing useful queries?
> 2) SPARQL as a query language is a combination of being too powerful and
> completely useless in practice: query timeouts are commonplace, endpoints
> don't support all of 1.1, etc.  And, going back to point #1, it's hard to
> know how to optimize your queries unless you are already pretty familiar
> with the data
> 3) SPARQL is a flawed "API interface" from the get-go (IMHO) for the same
> reason we don't offer a public SQL interface to our RDBMSes
>
> Which isn't to say it doesn't have its uses or applications.
>
> I just think that in most cases domain/service-specific APIs (be they
> RESTful, based on the Linked Data API [0], whatever) will likely be favored
> over generic SPARQL endpoints.  Are n+1 different APIs ideal?  I am pretty
> sure the answer is "no", but that's the future I foresee, personally.
>
> -Ross.
> 0. https://code.google.com/p/linked-data-api/wiki/Specification
>
>
> On Wed, Nov 6, 2013 at 11:28 AM, Karen Coyle  wrote:
>
> > Ross, I agree with your statement that data doesn't have to be "RDF all
> > the way down", etc. But I'd like to hear more about why you think SPARQL
> > availability has less value, and if you see an alternative to SPARQL for
> > querying.
> >
> > kc
> >
> >
> >
> > On 11/6/13 8:11 AM, Ross Singer wrote:
> >
> >> Hugh, I don't think you're in the weeds with your question (and, while I
> >> think that named graphs can provide a solution to your particular
> problem,
> >> that doesn't necessarily mean that it doesn't raise more questions or
> >> potentially more frustrations down the line - like any new power, it can
> >> be
> >> used for good or evil and the difference might not be obvious at first).
> >>
> >> My question for you, however, is why are you using a triple store for
> >> this?
> >>   That is, why bother with the broad and general model in what I assume
> >> is a
> >> closed world assumption in your application?
> >>
> >> We don't generally use XML databases (Marklogic being a notable
> >> exception),
> >> or MARC databases, or  choice>-specific
> >> databases because usually transmission formats are designed to account
> for
> >> lots and lots of variations and maximum flexibility, which generally is
> >> the
> >> opposite of the modeling that goes into a specific app.
> >>
> >> I think there's a world of difference between modeling your data so it
> can
> >> be represented in RDF (and, possibly, available via SPARQL, but I think
> >> there is *far* less value there) and committing to RDF all the way down.
> >>   RDF is a generalization so multiple parties can agree on what data
> >> means,
> >> but I would have a hard time swallowing the argument that
> domain-specific
> >> data must be RDF-native.
> >>
> >> -Ross.
> >>
> >>
> >> On Wed, Nov 6, 2013 at 10:52 AM, Hugh Cayless 
> >> wrote:
> >>
> >>  Does that work right down to the level of the individual triple though?
> >>> If
> >>> a large percentage of my triples are each in their own individual
> graphs,
> >>> won't that be chaos? I really don't know the answer, it's not a
> >>> rhetorical
> >>> question!
> >>>
> >>> Hugh
> >>>
> >>> On Nov 6, 2013, at 10:40 , Robert Sanderson 
> wrote:
> >>>
> >>>  Named Graphs are the way to solve the issue you bring up in that post,
>  in
>  my opinion.  You mint an identifier for the graph, and associate the
>  provenance and other information with that.  This then gets ingested
> as
> 
> >>> the
> >>>
>  4th URI into a quad store, so you don't lo

Re: [CODE4LIB] We should use HTTPS on code4lib.org

2013-11-04 Thread Ethan Gruber

NSA broke it already


On Mon, Nov 4, 2013 at 1:42 PM, William Denton  wrote:

> I think it's time we made everything on code4lib.org use HTTPS by default
> and redirect people to HTTPS from HTTP when needed.  (Right now there's an
> outdated self-signed SSL certificate on the site, so someone took a stab at
> this earlier, but it's time to do it right.)
>
> StartCom gives free SSL certs [0], and there are lots of places that sell
> them for prices that seem to run over $100 per year (which seems ridiculous
> to me, but maybe there's a good reason).
>
> I don't know which is the best way to get a cert for a site like this, but
> if people agree this is the right thing to do, perhaps someone with some
> expertise could work with the Oregon State hosts?
>
> More broadly, I think everyone should be using HTTPS everywhere (and HTTPS
> Everywhere, the browser extension).  Are any of you implementing HTTPS on
> your institution's sites, and moving to it as default?  It's one of those
> slightly finicky things that on the surface isn't necessary (why bother
> with a library's opening hours or address?) but deeper down is, because
> everyone should be able to browse the web without being monitored.
>
> Bill
>
> [0] https://cert.startcom.org/
>
> --
> William Denton
> Toronto, Canada
> http://www.miskatonic.org/
>

[CODE4LIB] Numismatic Data Standards and Ontologies Roundtable at CAA 2014

2013-10-22 Thread Ethan Gruber

Andrew Meadows, Karsten Tolle, and David Wigg-Wolf invite participants for
a roundtable on numismatic data standards and exchange, to be held at the
Computer Applications and Quantitative Methods in Archaeology (CAA)
conference (http://caa2014.sciencesconf.org/), Paris, 22-25 April 2014.

Coins survive in vast numbers from many historical periods and cultures,
providing important evidence for a wide variety of social, political and
economic aspects of those cultures. But currently these data are only
potentially available, as differing national traditions have yet to
integrate their substantial datasets on the basis of shared vocabularies,
syntax and structure.

Building on the experience with Linked Data of projects such as nomisma.org,
the European Coin Find Network (ECFN:
http://www.ecfn.fundmuenzen.eu/Home.html) and Online Coins of the Roman
Empire (OCRE: http://numismatics.org/ocre/), the roundtable will provide a
forum for the presentation and discussion of (meta)data standards and
ontologies for data repositories containing information on coins, with a
view to advancing the possibilities of data exchange and facilitating
access to data across a range of repositories. The round table follows on
from the two joint meetings of nomisma.org and ECFN, which concentrated on
ancient, primarily Roman coins, held in Frankfurt, Germany in May 2012; and
Carnuntum, Austria in April 2013, which was attended by 25 participants
from 10 European countries and the USA. The round table is intended to
encourage discussion among a wider community, beyond that of ancient
numismatics, drawing together lessons from a broader range of projects, and
embedding the results in the more general landscape of cultural heritage
data management. Too often in the past numismatists have allowed themselves
to operate in isolation from other related disciplines, including
archaeology, a deficit that this session also aims to address.

Although the core data required to identify and describe coins of almost
all periods are relatively simple (e.g. issuer, mint, date, denomination,
material, weight, size, description of obverse and reverse, etc.), and this
can result in a significant degree of correlation between the structure of
different repositories, linking disparate numismatics repositories presents
a number of problems. Nevertheless, coins provide an ideal test bed for the
implementation of concepts such as Linked Data and the creation of
standardised thesauri, the lessons of which can be profitably applied to
other, more complex fields.

Organizers:

Dr Andrew Meadows
Deputy Director
American Numismatic Society

Dr Karsten Tolle
DBIS
Goethe University

Dr David Wigg-Wolf
Römisch-Germanische Kommission des Deutschen Archäologischen Instituts

Re: [CODE4LIB] CODE4LIB Digest - 12 Sep 2013 to 13 Sep 2013 (#2013-237)

2013-09-16 Thread Ethan Gruber

Using SPARQL to validate seems like tremendous overhead.  From the Gerber
abstract: "A total of 55 rules have been defined representing the
constraints and requirements of the OA Specification and Ontology. For each
rule we have defined a SPARQL query to check compliance." I hope this isn't
55 SPARQL queries per RDF resource.

Europeana's review of schematron indicated what I pointed out earlier, that
it confines one to using RDF/XML, which is "sub-optimal" in their own
words.  One could accept RDF in any serialization and then run it through
an RDF processor, like rapper (http://librdf.org/raptor/rapper.html), into
XML and then validate.  Eventually, when XPath/XSLT 3 supports JSON and
other non-XML data models, theoretically, schematron might then be able to
validate other serializations of RDF.  Ditto for XForms, which we are using
to validate RDF/XML.  Obviously, this is sub-optimal because our workflow
doesn't yet account for non-XML data.  We will probably go with the rapper
intermediary process until XForms 2 is released.

Ethan


On Mon, Sep 16, 2013 at 10:22 AM, Karen Coyle  wrote:

> On 9/16/13 6:29 AM, aj...@virginia.edu wrote:
>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> I'd suggest that perhaps the confusion arises because "This instance is
>> (not) 'valid' according to that ontology." might be inferred from an
>> instance and an ontology (under certain conditions), and that's the soul of
>> what we're asking when we define constraints on the data. Perhaps OWL can
>> be used to express conditions of validity, as long as we represent the
>> quality "valid" for use in inferences.
>>
>
> Based on the results of the RDF Validation workshop [1], validation is
> being expressed today as SPARQL rules. If you express the rules in OWL then
> unfortunately you affect downstream re-use of your ontology, and that can
> create a mess for inferencing and can add a burden onto any reasoners,
> which are supposed to apply the OWL declarations.
>
> One participant at the workshop demonstrated a system that used the OWL
> "constraints" as constraints, but only in a closed system. I think that the
> use of SPARQL is superior because it does not affect the semantics of the
> classes and properties, only the instance data, and that means that the
> same properties can be validated differently for different applications or
> under different contexts. As an example, one community may wish to say that
> their metadata can have one and only one dc:title, while others may allow
> more than one. You do not want to constrain dc:title throughout the Web,
> only your own use of it. (Tom Baker and I presented a solution to this on
> the second day as Application Profiles [2], as defined by the DC community).
>
> kc
> [1] 
> https://www.w3.org/2012/12/**rdf-val/agenda
> [2] http://www.w3.org/2001/sw/**wiki/images/e/ef/Baker-dc-**
> abstract-model-revised.pdf
>
>
>  - ---
>> A. Soroka
>> The University of Virginia Library
>>
>> On Sep 13, 2013, at 11:00 PM, CODE4LIB automatic digest system wrote:
>>
>>  Also, remember that OWL does NOT constrain your data, it constrains only
>>> the inferences that you can make about your data. OWL operates at the
>>> ontology level, not the data level. (The OWL 2 documentation makes this
>>> more clear, in my reading of it. I agree that the example you cite sure
>>> looks like a constraint on the data... it's very confusing.)
>>>
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
>> Comment: GPGTools - http://gpgtools.org
>>
>> iQEcBAEBAgAGBQJSNwe2AAoJEATpPY**SyaoIkwLcIAK+**sMzy1XkqLStg94F2I40pe
>> 0DepjqVhdPnaDS1Msg7pd7c7iC0L5N**hCWd9BxzdvRgeMRr123zZ3EmKDSy8X**ZiGf
>> uQyXlA9cOqpCxdQLj2zXv5VHrOdlsA**1UAGprwhYrxOz/**v3xQ7b2nXusRoZRfDlts
>> iadvWx5DhLEb2+**uVl9geteeymLIVUTzm8WnUITEE7by2**HAQf9VlT9CrQSVQ21wLC
>> hvmk47Nt8WIGyPwRh1qOhvIXLDLxD9**rkBSC1G01RhzwvctDy88Tmt2Ut47ZR**EScP
>> YUz/bf/qxITzX2L7tE35s2w+**RUIFIFc4nJa3Xhp0wMoTAz5UYMiWIc**XZ38qfGlY=
>> =PJTS
>> -END PGP SIGNATURE-
>>
>
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet
>

Re: [CODE4LIB] Expressing negatives and similar in RDF

2013-09-13 Thread Ethan Gruber

+1


On Fri, Sep 13, 2013 at 8:51 AM, Esmé Cowles  wrote:

> Thomas-
>
> This isn't something I've run across yet.  But one thing you could do is
> create some URIs for different kinds of unknown/nonexistent titles:
>
> example:book1 dc:title example:unknownTitle
> example:book2 dc:title example:noTitle
> etc.
>
> You could then describe example:unknownTitle with a label or comment to
> fully describe the states you wanted to capture with the different
> categories.
>
> -Esme
> --
> Esme Cowles 
>
> "Necessity is the plea for every infringement of human freedom. It is the
>  argument of tyrants; it is the creed of slaves." -- William Pitt, 1783
>
> On 09/13/2013, at 7:32 AM, "Meehan, Thomas"  wrote:
>
> > Hello,
> >
> > I'm not sure how sensible a question this is (it's certainly
> theoretical), but it cropped up in relation to a rare books cataloguing
> discussion. Is there a standard or accepted way to express negatives in
> RDF? This is best explained by examples, expressed in mock-turtle:
> >
> > If I want  to say this book has the title "Cats in RDA" I would do
> something like:
> >
> > example:thisbook dc:title "Cats in RDA" .
> >
> > Normally, if a predicate like dc:title is not relevant to
> example:thisbook I believe I am right in thinking that it would simply be
> missing, i.e. it is not part of a record where a set number of fields need
> to be filled in, so no need to even make the statement. However, there are
> occasions where a positively negative statement might be useful. I
> understand OWL has a way of managing the statement This book does not have
> the title "Cats in RDA" [1]:
> >
> > []  rdf:type owl:NegativePropertyAssertion ;
> > owl:sourceIndividual   example:thisbook ;
> > owl:assertionProperty  dc:title ;
> > owl:targetIndividual   "Cats in RDA" .
> >
> > However, it would be more useful, and quite common at least in a
> bibliographic context, to say "This book does not have a title". Ideally
> (?!) there would be an ontology of concepts like "none", "unknown", or even
> "something, but unspecified":
> >
> > This book has no title:
> > example:thisbook dc:title hasobject:false .
> >
> > It is unknown if this book has a title (sounds undesirable but I can
> think of instances where it might be handy[2]):
> > example:thisbook dc:title hasobject:unknown .
> >
> > This book has a title but it has not been specified:
> > example:thisbook dc:title hasobject:true .
> >
> > In terms of cataloguing, the answer is perhaps to refer to the rules
> (which would normally mandate supplied titles in square brackets and so
> forth) rather than use RDF to express this kind of thing, although the
> rules differ depending on the part of description and, in the case of the
> kind of thing that prompted the question- the presence of clasps on rare
> books- there are no rules. I wonder if anyone has any more wisdom on this.
> >
> > Many thanks,
> >
> > Tom
> >
> > [1] Adapted from
> http://www.w3.org/2007/OWL/wiki/Primer#Object_Properties
> > [2] No many tbh, but e.g. title in an unknown script or indecipherable
> hand.
> >
> > ---
> >
> > Thomas Meehan
> > Head of Current Cataloguing
> > Library Services
> > University College London
> > Gower Street
> > London WC1E 6BT
> >
> > t.mee...@ucl.ac.uk
>

Re: [CODE4LIB] W3C RDF Validation Workshop

2013-09-12 Thread Ethan Gruber

RDF is not the be all end all for representing information, so I don't know
if there is a point to defining a validation schema which can also be
represented in RDF since requirements vary from model to model, project to
project.  If you were creating RDF/XML, you could enforce complex
validation through schematron.  XForms 2.0 will support JSON and other
non-XML data models, so you could enforce complex validation through XForms
bindings since XPath 3 will support parsing JSON, thus JSON-LD.

Our project consists of (at the moment) tens of thousands of concepts
defined at URIs and represented by XHTML+RDFa fragments.  These bits of
XHTML are edited in XForms, so the validation is pretty tight.  The
XHTML+RDFa is transformed into RDF proper upon file save and posted into
our endpoint with the SPARQL/Update mechanism.

But my broader point is: RDF (typically) is a derivative resource of a more
detailed data model.  In the case where the RDF is derivative of a
canonical resource/document, validation can be applied more consistently
during the editing process of the canonical resource.

Ethan

On Thu, Sep 12, 2013 at 11:19 AM, Karen Coyle  wrote:

> I followed the W3C RDF Validation Workshop [1] over the last two days. The
> web page has both written papers and slides from each presentation.
>
> The short summary is that a number of users of RDF have found a need to do
> traditional style validation (required, one or more, must be numeric/from a
> list, etc.) on their RDF metadata. There is currently no RDF-based standard
> for defining validation rules, so each of these is an ad hoc solution which
> cannot be easily exchanged. [2]
>
> The actual technology of validation in all cases is SPARQL. Whether or not
> this really scales is one of the questions, but it seems pretty clear that
> SPARQL will continue to be the solution for the near future.
>
> I will try to write up a blog post that will give some more information.
>
> kc
>
>
> [1] 
> https://www.w3.org/2012/12/**rdf-val/agenda
> [2] nota bene: Although OWL appears to provide validation rules, the OWL
> rules only support inferencing. OWL cannot be used to constrain your data
> to valid values.
>
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
>

Re: [CODE4LIB] What do you want to learn about linked data?

2013-09-04 Thread Ethan Gruber

There's a lot of really great linked data stuff going on in classical
studies.  The Pelagios project (http://pelagios-project.blogspot.com/) is
one of the best examples because the bar for participation is set very
low.  The RDF model is very simple, linking objects (works of literature,
sculpture, papyri, coins, whatever) represented at URIs to URIs for places
defined in the Pleiades Gazetteer of Ancient Places (
http://pleiades.stoa.org/), enabling aggregation of content based on
geography.

Ethan


On Wed, Sep 4, 2013 at 10:01 AM, Eric Lease Morgan  wrote:

> On Sep 4, 2013, at 9:42 AM, Eric Lease Morgan  wrote:
>
> >> I get the basic concepts of linked data.  But what I don't understand is
> >> why the idea has been around so long, yet there seems to be a dearth of
> >> useful applications that live up to the hype.  So, what I want to learn
> >> about linked data is: who's using it effectively?  Maybe there's lots of
> >> stuff out there that I just don't know about?
> >
> > I've been doing some reading and evaluating in the regard to Linked Data
> [0], and I think the problem is multi-diminentional:
>
>
> And here is yet another perspective. Maybe Linked Data is really too hard
> to implement. Think OAI-PMH. It was suppose to be a low barrier method for
> making metadata available to the world -- an idea not dissimilar to the
> ideas behind Linked Data and the Semantic Web. Heck, all you needed was
> Dublin Core and the creation of various XML streams distributed by servers
> who knew only a handful of commands.
>
> Unfortunately, few people went beyond Dublin Core and the weaknesses of
> the vocabulary became extremely apparent. Just look at the OAI available
> from things like ContentDM -- thin to say the least. In the end OAI was not
> seen as low barrier as once thought. Low barrier for computer types, but
> not necessarily so for others. From the concluding remarks in a 2006 paper
> by Carl Lagoze given at JCDL:
>
>   Metadata Aggregation and “Automated Digital Libraries”: A
>   Retrospective on the NSDL Experience
>
>   Over the last three years the NSDL CI team has learned that a
>   seemingly modest architecture based on metadata harvesting is
>   surprisingly difficult to manage in a large-scale implementation.
>   The administrative difficulties result from a combination of
>   provider difficulties with OAI-PMH and Dublin Core, the
>   complexities in consistent handling of multiple metadata feeds
>   over a large number of iterations, and the limitations of
>   metadata quality remediation.
>
>   http://arxiv.org/pdf/cs/0601125.pdf
>
> The issues with Linked Data and the Semantic Web may be similar, but does
> that mean we should give it a try?
>
> --
> Eric Lease Morgan
>

Re: [CODE4LIB] Subject Terms in Institutional Repositories

2013-08-30 Thread Ethan Gruber

I'd hold off on AAT until the release of the Getty vocabularies as linked
open data in the near future.  No sense in investing time to purchase or
otherwise harvest terms from the Getty's current framework when the
architecture is going to change very soon.

On a related note, the British Museum's art-related thesauri are already
linked open data, but not as transparent and accessible as one would prefer.

Ethan


On Fri, Aug 30, 2013 at 9:44 AM, Jacob Ratliff wrote:

> That does help, thanks.
>
> So, what you probably need to do then is take some time to strategically
> think about what you want the controlled vocabularies to accomplish, and
> what types of resources you have available to implement them.
>
> How granular do you want to be in each subject area? (e.g. Do you want to
> use MeSH  for all the medical information,
> or is that too detailed?)
> Are you just looking for cursory subject headings so that people can find a
> larger collection that they're looking for? (LoC could be good for this)
> Are you going to use a different controlled vocabulary for each collection?
> (e.g. MeSH for dentistry, LoC for general, etc.)
> Who is going to go back and re-tag all of the digital objects with new
> metadata?
>
> You can also look at www.taxonomywarehouse.com for some ideas of different
> controlled vocabularies that are available. I also recommend the Art and
> Architecture Thesaurus  for
> art
> assets.
>
> Is this kind of what you're looking for? I highly recommend sitting down
> and defining what your goals are for the controlled vocabulary you want to
> implement, because that will inform that type of vocabulary you use.
>
> Jacob Ratliff
> Archivist/Taxonomy Librarian
> National Fire Protection Association
>
>
> On Fri, Aug 30, 2013 at 9:36 AM, Matthew Sherman
> wrote:
>
> > Sorry, I probably should have provided a bit more depth.  It is a
> > University Institutional Repository so we have a rather varied collection
> > of materials from engineering to education to computer science to
> > chiropractic to dental to some student theses and posters.  So I guess I
> > need to find something at is extensible.  Does that provide a better idea
> > or should I provide more info?
> >
> >
> > On Fri, Aug 30, 2013 at 9:32 AM, Jacob Ratliff  > >wrote:
> >
> > > Hi Matt,
> > >
> > > It depends on the subject area of your repository. There are dozens of
> > > controlled vocabularies that exist (not including specific Enterprise
> > > Content Management controlled vocabularies). If you can describe your
> > > collection, people might be able to advise you better.
> > >
> > > Jacob Ratliff
> > > Archivist/Taxonomy Librarian
> > > National Fire Protection Association
> > >
> > >
> > > On Fri, Aug 30, 2013 at 9:26 AM, Matthew Sherman
> > > wrote:
> > >
> > > > Hello Code4Libbers,
> > > >
> > > > I am working on cleaning up our institutional repository, and one of
> > the
> > > > big areas of improvement needed is the list of terms from the subject
> > > > fields.  It is messy and I want to take the subject terms and place
> > them
> > > > into a much better order.  I was contemplating using Library of
> > Congress
> > > > Subject Headings, but I wanted to see what others have done in this
> > area
> > > to
> > > > see if there is another good controlled vocabulary that could work
> > > better.
> > > > Any insight is welcome.  Thanks for your time everyone.
> > > >
> > > > Matt Sherman
> > > > Digital Content Librarian
> > > > University of Bridgeport
> > > >
> > >
> >
>

Re: [CODE4LIB] linked archival metadata: a guidebook

2013-08-12 Thread Ethan Gruber

I'll implement your linked data specifications into EADitor as soon as
they're ready.  In fact, I began implementing Aaron Rubinstein's hybrid
arch/dc ontology (http://gslis.simmons.edu/archival/arch/index.html) a few
days ago.

Ethan


On Mon, Aug 12, 2013 at 9:23 AM, Stephen Marks wrote:

> Hi Eric--
>
> Good luck! I'll be very interested to see how this shapes up.
>
> Best,
>
> Steve
>
>
>
> On Aug-12-2013 9:10 AM, Eric Lease Morgan wrote:
>
>> This is the tiniest of introductions as a person who will be writing a
>> text called Linked Archival Metadata: A Guidebook. The Guidebook will be
>> the product of LiAM [0], and from the prospectus [1], the purpose of the
>> Guidebook is to:
>>
>>provide archivists with an overview of the current linked data
>>landscape, define basic concepts, identify practical strategies
>>for adoption, and emphasize the tangible payoffs for archives
>>implementing linked data. It will focus on clarifying why
>>archives and archival users can benefit from linked data and will
>>identify a graduated approach to applying linked data methods to
>>archival description.
>>
>> To these ends I plan to write towards three audiences: 1) the layman who
>> knows nothing about linked data, 2) the archivist who wants to make their
>> content available as linked data but does not know how, and 3) the computer
>> technologist who knows how to make linked data accessible but does not know
>> about archival practices.
>>
>> Personally, I have been dabbling on and off with linked data and the
>> Semantic Web for a number of years. I have also been deeply involved with a
>> project called the Catholic Research Resources Alliance [2] whose content
>> mostly comes from archives. I hope to marry these two sets of experiences
>> into something that will be useful to cultural heritage institutions,
>> especially archives.
>>
>> The Guidebook is intended to be manifested in both book (PDF) and wiki
>> forms. The work begins now and is expected to be completed by March 2014.
>> On my mark. Get set. Go. Wish me luck, and let’s see if we can build some
>> community.
>>
>> [0] LiAM - http://sites.tufts.edu/liam/
>> [1] prospectus - http://bit.ly/15TX0rs
>> [2] Catholic Research Resources Alliance - http://www.catholicresearch.**
>> net/ 
>>
>> --
>> Eric Lease Morgan
>>
>>
>>
> --
>
>
>
> Stephen Marks
> Digital Preservation Librarian
> Scholars Portal
> Ontario Council of University Libraries
>
> step...@scholarsportal.info
> 416.946.0300
>

Re: [CODE4LIB] Python and Ruby

2013-07-30 Thread Ethan Gruber

All languages other than assembly are boutique and must be eliminated like
the cancer that they are.


On Tue, Jul 30, 2013 at 11:14 AM, Ross Singer  wrote:

> What would you consider a "boutique" language?  What isn't?
>
> -Ross.
>
>
> On Tue, Jul 30, 2013 at 10:21 AM, Rich Wenger  wrote:
>
> > The proliferation of boutique "languages" is a cancer on our community.
> >  Each one is a YAP (Yet Another Priesthood), and little else.  The world
> > does not need five slightly varying syntaxes for a substring function.
> If I
> > had switched languages every time the web community "recommended" it, I
> > would have rewritten a mountain of apps at least twice in the past five
> > years.  What's next, a separate language to put periods at the end of
> > sentences? Just my $.02.  That is all.
> >
> > Rich Wenger
> > E-Resource Systems Manager, MIT Libraries
> > rwen...@mit.edu
> > 617-253-0035
> >
> >
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@listserv.nd.edu] On Behalf Of
> > Joshua Welker
> > Sent: Tuesday, July 30, 2013 9:56 AM
> > To: CODE4LIB@listserv.nd.edu
> > Subject: Re: [CODE4LIB] Python and Ruby
> >
> > I am already a big user of PHP for web apps, but PHP does not make a
> > fantastic scripting language in my experience.
> >
> > Josh Welker
> > Information Technology Librarian
> > James C. Kirkpatrick Library
> > University of Central Missouri
> > Warrensburg, MO 64093
> > JCKL 2260
> > 660.543.8022
> >
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> > Riley Childs
> > Sent: Tuesday, July 30, 2013 8:18 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] Python and Ruby
> >
> > No mention of PHP?
> >
> > Sent from my iPhone
> >
> > On Jul 30, 2013, at 9:14 AM, Kurt Nordstrom 
> > wrote:
> >
> > > Whoohoo, late to the party!
> > >
> > > I like Python because I learned it first, and I haven't had a need to
> > > explore Ruby yet.
> > >
> > > I did briefly foray into learning Ruby in order to try to learn Rails,
> > > and I actually found that my background in Python sort of gave me
> > > brain-jam for learning Ruby, because the languages were so close
> > > together, but just different in some ways. So my mind would be 'oh, so
> > > it's just  but then, it's not. If I tackle
> > > Ruby again, I will definitely try to 'empty my cup' first.
> > >
> > > -K
> > >
> > >
> > > On Tue, Jul 30, 2013 at 8:55 AM, Marc Chantreux  wrote:
> > >
> > >> hello,
> > >>
> > >> Sorry comming late with it but:
> > >>
> > >> On Mon, Jul 29, 2013 at 10:43:33AM -0500, Joshua Welker wrote:
> > >>> Not intending to start a language flame war/holy war here, but in
> > >>> the library coding community, is there a particular reason to use
> > >>> Ruby over Python or vice-versa?
> > >>
> > >> Is it the only choices you have? Because I'd personnally advice none
> > >> of them
> > >>
> > >> I tested both of them before stucking to Perl just because
> > >>
> > >> * it is very pleasant when it come to explore and modify
> > >> datastructures  and strings (which library things are).
> > >> * the ecosystem is briliant: perl comes with lot of libraries and
> > >> tools  with a quality i haven't found in other languages.
> > >>
> > >> Of course, perl is not perfect and i really would like to use a
> > >> modern emerging compiled language like go, rust, haskell or even
> > >> something on the jvm (like clojure or the emerging perl6) but all of
> > >> them miss libraries.
> > >>
> > >> HTH
> > >> regards
> > >> --
> > >> Marc Chantreux
> > >> Université de Strasbourg, Direction Informatique
> > >> 14 Rue René Descartes,
> > >> 67084  STRASBOURG CEDEX
> > >> ☎: 03.68.85.57.40
> > >> http://unistra.fr
> > >> "Don't believe everything you read on the Internet"
> > >>-- Abraham Lincoln
> > >
> > >
> > >
> > > --
> > > http://www.blar.net/kurt/blog/
> >
>

[CODE4LIB] Machine tags and flickr commons

2013-07-10 Thread Ethan Gruber

There is an enormous body of open photographs contributed by a myriad of
libraries and museums to flickr.  Is anyone aware of any efforts to
associate machine tags with these photos, for example to georeference with
geonames machine tags, tag people with VIAF ids, or categorize with LCSH
ids?  A quick Google search turns up nothing.  There's a little bit of this
going on with Pleiades ids for ancient geography (
http://www.flickr.com/photos/tags/pleiades%3A*/), but there's enormous
potential in library-produced images.

I think it would be incredibly powerful to aggregate images of manuscripts
created by Thomas Jefferson (VIAF id: 41866059) across institutions that
have digitized and uploaded them to flickr.

Ethan

Re: [CODE4LIB] LOC Subject Headings API

2013-06-05 Thread Ethan Gruber

I once put all of the LCSH headings into a local Solr index and used
TermsComponent to power autosuggest.  It was really fast.

Ethan


On Wed, Jun 5, 2013 at 12:47 PM, Joshua Welker  wrote:

> I realized since I made that comment that the API is designed to give the
> top 10 subject heading suggestions rather than all of them.
>
> So that part is fine. But I am once again unsure if the API will work for
> me. I am creating a mashup of several data sources for my auto-suggest
> feature, and I am having a hard time dynamically adding the results from
> the LOC Suggest API to the existing collection of data that is used to
> populate my jQuery UI Autocomplete field. Ideally, I'd like to be able to
> have all the LC Subject Heading data cached on my server so that I can
> build my autocomplete data source one time rather than having to deal with
> dynamically adding, sorting, etc. But then the problem I run into is that
> the LCSH master file is so big that it basically crashes the server.
>
> That's why I'm thinking I might have to give up on this project.
>
> Josh Welker
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Michael J. Giarlo
> Sent: Wednesday, June 05, 2013 9:59 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] LOC Subject Headings API
>
> Josh,
>
> Can you say more about how the API isn't behaving as you expected it to?
>
> -Mike
>
>
>
> On Wed, Jun 5, 2013 at 10:37 AM, Joshua Welker  wrote:
>
> > I went with this method and made some good progress, but the results
> > the API was returning were not what I expected. I might have to give
> > up on this project.
> >
> > Josh Welker
> >
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> > Of Ethan Gruber
> > Sent: Wednesday, June 05, 2013 8:22 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] LOC Subject Headings API
> >
> > You'd write some javascript to query the service with every keystroke,
> e.g.
> > http://id.loc.gov/authorities/suggest/?q=Hi replies with subjects
> > beginning with "hi*"  It looks like covo.js supports LCSH, so you
> > could look into that.
> >
> > Ethan
> >
> >
> > On Wed, Jun 5, 2013 at 9:13 AM, Joshua Welker 
> wrote:
> >
> > > This would work, except I would need a way to get all the subjects
> > > rather than just biology. Any idea how to do that? I tried removing
> > > the querystring from the URL and changing "Biology" in the URL to ""
> > > with no success.
> > >
> > > Josh Welker
> > >
> > >
> > > -Original Message-
> > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> > > Of Michael J. Giarlo
> > > Sent: Tuesday, June 04, 2013 7:05 PM
> > > To: CODE4LIB@LISTSERV.ND.EDU
> > > Subject: Re: [CODE4LIB] LOC Subject Headings API
> > >
> > > How about id.loc.gov's OpenSearch-powered autosuggest feature?
> > >
> > > mjg@moby:~$ curl http://id.loc.gov/authorities/suggest/?q=Biology
> > > ["Biology",["Biology","Biology Colloquium","Biology Curators'
> > > Group","Biology Databook Editorial Board (U.S.)","Biology and Earth
> > > Sciences Teaching Institute","Biology and Management of True Fir in
> > > the Pacific Northwest Symposium (1981 : Seattle, Wash.)","Biology
> > > and Resource Management Program (Alaska Cooperative Park Studies
> > > Unit)","Biology and behavior series","Biology and environment
> > > (Macmillan Press)","Biology and management of old-growth
> > > forests"],["1
> > > result","1 result","1 result","1
> > > result","1 result","1 result","1 result","1 result","1 result","1
> > > result"],["http://id.loc.gov/authorities/subjects/sh85014203",";
> > > http://id.loc.gov/authorities/names/n79006962",";
> > > http://id.loc.gov/authorities/names/n90639795",";
> > > http://id.loc.gov/authorities/names/n85100466",";
> > > http://id.loc.gov/authorities/names/nr97041787",";
> > > http://id.loc.gov/authorities/names/n85276541",";
> > > http://id.loc.gov/authorities/names/n82057525",";
> > > http://id.loc.gov/authorities/names/n90605

Re: [CODE4LIB] LOC Subject Headings API

2013-06-05 Thread Ethan Gruber

Are you referring to hierarchical sets of terms, like "United
States--History--War with Mexico, 1845-1848"?  This is an "earlier
established term" of http://id.loc.gov/authorities/subjects/sh85140201 (now
labeled "Mexican War, 1846-1848").  Ed Summers or Kevin Ford are in a
better position to discuss the change of terminology, but it looks like
LCSH is moving past this string-based hierarchy in favor of one expressed
in terms of linked data.

Ethan


On Wed, Jun 5, 2013 at 9:32 AM, Joshua Welker  wrote:

> I've seen those, but I can't figure out where on the id.loc.gov site
> there is actually a URL that provides a list of authority terms. All the
> links on the site seem to link to other pages within the site.
>
> Josh Welker
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Dana Pearson
> Sent: Tuesday, June 04, 2013 6:42 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] LOC Subject Headings API
>
> Joshua,
>
> There are different formats at LOC:
>
> http://id.loc.gov/authorities/subjects.html
>
> dana
>
>
> On Tue, Jun 4, 2013 at 6:31 PM, Joshua Welker  wrote:
>
> > I am building an auto-suggest feature into our library's search box,
> > and I am wanting to include LOC subject headings in my suggestions
> > list. Does anyone know of any web service that allows for automated
> > harvesting of LOC Subject Headings? I am also looking for name
> authorities, for that matter.
> > Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I
> > have spent a while Googling with no luck, but this seems like the sort
> > of general-purpose thing that a lot of people would be interested in.
> > I feel like I must be missing something. Any help is appreciated.
> >
> > Josh Welker
> > Electronic/Media Services Librarian
> > College Liaison
> > University Libraries
> > Southwest Baptist University
> > 417.328.1624
> >
>
>
>
> --
> Dana Pearson
> dbpearsonmlis.com
>

Re: [CODE4LIB] LOC Subject Headings API

2013-06-05 Thread Ethan Gruber

You'd write some javascript to query the service with every keystroke, e.g.
http://id.loc.gov/authorities/suggest/?q=Hi replies with subjects beginning
with "hi*"  It looks like covo.js supports LCSH, so you could look into
that.

Ethan


On Wed, Jun 5, 2013 at 9:13 AM, Joshua Welker  wrote:

> This would work, except I would need a way to get all the subjects rather
> than just biology. Any idea how to do that? I tried removing the
> querystring from the URL and changing "Biology" in the URL to "" with no
> success.
>
> Josh Welker
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Michael J. Giarlo
> Sent: Tuesday, June 04, 2013 7:05 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] LOC Subject Headings API
>
> How about id.loc.gov's OpenSearch-powered autosuggest feature?
>
> mjg@moby:~$ curl http://id.loc.gov/authorities/suggest/?q=Biology
> ["Biology",["Biology","Biology Colloquium","Biology Curators'
> Group","Biology Databook Editorial Board (U.S.)","Biology and Earth
> Sciences Teaching Institute","Biology and Management of True Fir in the
> Pacific Northwest Symposium (1981 : Seattle, Wash.)","Biology and Resource
> Management Program (Alaska Cooperative Park Studies Unit)","Biology and
> behavior series","Biology and environment (Macmillan Press)","Biology and
> management of old-growth forests"],["1 result","1 result","1 result","1
> result","1 result","1 result","1 result","1 result","1 result","1
> result"],["http://id.loc.gov/authorities/subjects/sh85014203",";
> http://id.loc.gov/authorities/names/n79006962",";
> http://id.loc.gov/authorities/names/n90639795",";
> http://id.loc.gov/authorities/names/n85100466",";
> http://id.loc.gov/authorities/names/nr97041787",";
> http://id.loc.gov/authorities/names/n85276541",";
> http://id.loc.gov/authorities/names/n82057525",";
> http://id.loc.gov/authorities/names/n90605518",";
> http://id.loc.gov/authorities/names/nr2001011448",";
> http://id.loc.gov/authorities/names/no94028058";]]
>
> -Mike
>
>
>
> On Tue, Jun 4, 2013 at 7:51 PM, Joshua Welker  wrote:
>
> > I did see that, and it will work in a pinch. But the authority file is
> > pretty massive--almost 1GB-- and would be difficult to handle in an
> > automated way and without completely killing my web app due to memory
> > constraints while searching the file. Thanks, though.
> >
> > Josh Welker
> >
> >
> > -Original Message-
> > From: Bryan Baldus [mailto:bryan.bal...@quality-books.com]
> > Sent: Tuesday, June 04, 2013 6:39 PM
> > To: Code for Libraries; Joshua Welker
> > Subject: RE: LOC Subject Headings API
> >
> > On Tuesday, June 04, 2013 6:31 PM, Joshua Welker [jwel...@sbuniv.edu]
> > wrote:
> > >I am building an auto-suggest feature into our library's search box,
> > >and
> > I am wanting to include LOC subject headings in my suggestions list.
> > Does anyone know of any web service that allows for automated
> > harvesting of LOC Subject Headings? I am also looking for name
> authorities, for that matter.
> > Any format will be acceptable to me: RDF, XML, JSON, HTML, CSV... I
> > have spent a while Googling with no luck, but this seems like the sort
> > of general-purpose thing that a lot of people would be interested in.
> > I feel like I must be missing something. Any help is appreciated.
> >
> > Have you seen http://id.loc.gov/ with bulk downloads in various
> > formats at http://id.loc.gov/download/
> >
> > I hope this helps,
> >
> > Bryan Baldus
> > Senior Cataloger
> > Quality Books Inc.
> > The Best of America's Independent Presses
> > 1-800-323-4241x402
> > bryan.bal...@quality-books.com
> > eij...@cpan.org
> > http://home.comcast.net/~eijabb/
> >
>

Re: [CODE4LIB] WorldCat Implements Content-Negotiation for Linked Data

2013-06-03 Thread Ethan Gruber

+1


On Mon, Jun 3, 2013 at 3:00 PM, Richard Wallis <
richard.wal...@dataliberate.com> wrote:

> The Linked Data for the millions of resources in WorldCat.org is now
> available as RDF/XML, JSON-LD, Turtle, and Triples via content-negotiation.
>
> Details:
>
> http://dataliberate.com/2013/06/content-negotiation-for-worldcat/
>
> ~Richard.
>

Re: [CODE4LIB] Visualizing RDF graphs

2013-05-02 Thread Ethan Gruber

Wow, that's pretty cool.  I tried one of the dbpedia examples.  I look
forward to playing around with it with our data.

Ethan

On Thu, May 2, 2013 at 5:40 AM, raffaele messuti  wrote:

> Ethan Gruber wrote:
> > This looks like it does what I want to do, but it requires Virtuoso and a
> > Scala environment.  I'm hesitant to dramatically modify my architecture
> > just to accommodate a feature.  I think I favor something a little
> simpler.
>
> take a look at LodLive, it's a simple jquery plugin
> http://en.lodlive.it/
> https://github.com/dvcama/LodLive
>
>
> --
> raffaele
>

Re: [CODE4LIB] Visualizing RDF graphs

2013-05-01 Thread Ethan Gruber

Hey Mark,

This looks like it does what I want to do, but it requires Virtuoso and a
Scala environment.  I'm hesitant to dramatically modify my architecture
just to accommodate a feature.  I think I favor something a little simpler.

Thanks,
Ethan


On Wed, May 1, 2013 at 10:33 AM, Mark A. Matienzo
wrote:

> Hi Ethan,
>
> Have you looked at Payola? <https://github.com/payola/Payola>
>
> Mark
>
> --
> Mark A. Matienzo 
> Digital Archivist, Manuscripts and Archives, Yale University Library
> Technical Architect, ArchivesSpace
>
>
> On Wed, May 1, 2013 at 9:24 AM, Ethan Gruber  wrote:
> > Hi all,
> >
> > I have a fair amount of data in a triplestore, and I'd like to experiment
> > with different forms of visualization.  I have found a few libraries for
> > visualizing RDF graphs through Google, but they still seem relatively
> > rudimentary.  Does anyone on the list have recommendations?  I'm looking
> > for something that can use SPARQL.  I'd like to avoid creating duplicates
> > or derivatives of data, like GraphML, unless it is possible to render
> > GraphML which has been serialized from SPARQL results on the fly.
> >
> > Thanks,
> > Ethan
>

[CODE4LIB] Visualizing RDF graphs

2013-05-01 Thread Ethan Gruber

Hi all,

I have a fair amount of data in a triplestore, and I'd like to experiment
with different forms of visualization.  I have found a few libraries for
visualizing RDF graphs through Google, but they still seem relatively
rudimentary.  Does anyone on the list have recommendations?  I'm looking
for something that can use SPARQL.  I'd like to avoid creating duplicates
or derivatives of data, like GraphML, unless it is possible to render
GraphML which has been serialized from SPARQL results on the fly.

Thanks,
Ethan

Re: [CODE4LIB] tiff2pdf, then back to pdf?

2013-04-26 Thread Ethan Gruber

What's your use case in this scenario? Do you want to provide access to the
PDFs over the web or are you using them as your archival format?  You
probably don't want to use PDF to achieve both objectives.

Ethan
On Apr 26, 2013 5:11 PM, "Edward M. Corrado"  wrote:

> This works sometimes. Well, it does give me a new tiff file from the pdf
> all of the time, but it is not always anywhere near the same size as the
> original tiff. My guess is that maybe there is a flag or somethign that
> woulf help. Here is what I get with one fil:
>
>
> ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.tif
> A001a.pdf
> ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.pdf
> A001b.tif
> ecorrado@ecorrado:~/Desktop/test$ ls -al
> total 361056
> drwxrwxr-x 2 ecorrado ecorrado 4096 Apr 26 17:07 .
> drwxr-xr-x 7 ecorrado ecorrado20480 Apr 26 16:54 ..
> -rw-rw-r-- 1 ecorrado ecorrado 38497046 Apr 26 17:07 A001a.pdf
> -rw-r--r-- 1 ecorrado ecorrado 38178650 Apr 26 17:07 A001a.tif
> -rw-rw-r-- 1 ecorrado ecorrado  5871196 Apr 26 17:07 A001b.tif
>
>
> In this case, the two tif files should be the same size. They are not even
> close. Maybe there is a flag to convert (besides compress) that I can use.
> FWIW: I tried three files/ 2 are like this. The other one, the resulting
> tiff is the same size as the original.
>
> Edward
>
>
>
>
>
> On Fri, Apr 26, 2013 at 4:25 PM, Aaron Addison  >wrote:
>
> > Imagemagick's convert will do it both ways.
> >
> > convert a.tiff b.pdf
> > convert b.pdf a.tiff
> >
> > If the pdf is more than one page, the tiff will be a multipage tiff.
> >
> > Aaron
> >
> > --
> > Aaron Addison
> > Unix Administrator
> > W. E. B. Du Bois Library UMass Amherst
> > 413 577 2104
> >
> >
> >
> > On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado wrote:
> > > Hi All,
> > >
> > > I have a need to batch convert many TIFF images to PDF. I'd then like
> to
> > be
> > > able to discard the TIFF images, but I can only do that if I can create
> > the
> > > original TIFF again from the PDF. Is this possible? If so, using what
> > tools
> > > and how?
> > >
> > > tiff2pdf seems like a possible solution, but I can't find a
> corresponding
> > > "pdf2tif" program that reverses the process.
> > >
> > > Any ideas?
> > >
> > > Edward
> >
>

Re: [CODE4LIB] Fuseki and other SPARQL servers

2013-02-22 Thread Ethan Gruber

I have a follow-up:

By default, Jetty starts Fuseki with -Xmx1200M for heap.  Have you altered
this for production?  How many triples do you have and how often does your
endpoint process queries?  Our dataset won't be large at first (low
millions of triples), but we can reasonably expect 10,000+ SPARQL queries
per day.  That's not a lot by dbpedia standards, but I have no idea how
that compares to average LAM systems.

Thanks,
Ethan


On Thu, Feb 21, 2013 at 9:42 AM, Ethan Gruber  wrote:

> Thanks everyone for the info. This soothed my apprehensions of running
> Fuseki in a production environment.
>
> Ethan
>
>
> On Wed, Feb 20, 2013 at 4:05 PM, Ross Singer wrote:
>
>> I'll add that the LARQ plugin for Fuseki (which adds Lucene indexes) is
>> pretty awesome, as well.
>>
>> -Ross.
>>
>> On Feb 20, 2013, at 3:57 PM, John Fereira  wrote:
>>
>> > If forgot about that.  That issue was created quite awhile ago and I
>> hadn't check on it in a long time.  I've found that Jetty has worked fine
>> in our production environment so far.  As I wrote earlier, I have it
>> connecting to a jena SDB that is used for a semantic web application (VIVO)
>> that was developed here.  Although we have the semantic web application
>> running on a different server than the SDB database I found the performance
>> was fairly significantly improved by having the Fuseki server running on
>> the same machine as the SDB.
>> >
>> > -Original Message-
>> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
>> Of Ethan Gruber
>> > Sent: Wednesday, February 20, 2013 2:52 PM
>> > To: CODE4LIB@LISTSERV.ND.EDU
>> > Subject: Re: [CODE4LIB] Fuseki and other SPARQL servers
>> >
>> > Hi Hugh,
>> >
>> > I have investigated the possibility of deploying Fuseki as a war in
>> Tomcat (
>> > https://issues.apache.org/jira/browse/JENA-201) because I wasn't sure
>> how the default Jetty container would respond in production, but since you
>> aren't having any problems with that deployment, I may go ahead and do that.
>> >
>> > Ethan
>> >
>> >
>> > On Wed, Feb 20, 2013 at 2:39 PM, Hugh Cayless 
>> wrote:
>> >
>> >> Hi Ethan!
>> >>
>> >> We've been using Jena/Fuseki in papyri.info for about a year now,
>> iirc.
>> >> We started with Mulgara, but switched. It's running in its own Jetty
>> >> container in our system, but I've had no performance issues with it
>> >> whatever.
>> >>
>> >> Best,
>> >> Hugh
>> >>
>> >> On Feb 20, 2013, at 14:31 , Ethan Gruber  wrote:
>> >>
>> >>> Hi all,
>> >>>
>> >>> I have been playing around with Fuseki (
>> >>> http://jena.apache.org/documentation/serving_data/index.html) for a
>> >>> few months to get my feet wet with accessing and querying RDF.  I
>> >>> quite like it. I find it well documented and easy to set up.  We
>> >>> will soon deploy a SPARQL server in a production environment, and I
>> >>> would like to know if others on the list have experience with Fuseki
>> >>> in production, or have
>> >> other
>> >>> recommendations.  Mulgara is off the table as it inexplicably
>> >>> conflicts with other apps installed in Tomcat.
>> >>>
>> >>> Thanks,
>> >>> Ethan
>> >>
>>
>
>

Re: [CODE4LIB] You are a pedantic coder. So what am I?

2013-02-21 Thread Ethan Gruber

Look, I'm sure we can list the many ways different languages fail to meet
our expectations, but is this really a constructive line of conversation?

-1


On Thu, Feb 21, 2013 at 12:40 PM, Justin Coyne
wrote:

> I did misspeak a bit.  You can override static methods in Java.  My major
> issue is that there is no "getClass()" within a static method, so when the
> static method is being run in the context of the inheriting class it is
> unaware of its own run context.
>
> For example: I want the output to be "Hi from bar", but it's "Hi from foo":
>
> class Foo {
>   public static void sayHello() {
> hi();
>   }
>   public static void hi() {
> System.out.println("Hi from foo");
>   }
> }
>
> class Bar extends Foo {
>
>   public static void hi() {
> System.out.println("Hi from bar");
>   }
> }
>
> class Test {
>   public static void main(String [ ] args) {
> Bar.sayHello();
>   }
> }
>
>
> -Justin
>
>
>
> On Thu, Feb 21, 2013 at 11:18 AM, Eric Hellman  wrote:
>
> > OK, pedant, tell us why you think methods that can be over-ridden are
> > static.
> > Also, tell us why you think classes in Java are not instances of
> > java.lang.Class
> >
> >
> > On Feb 18, 2013, at 1:39 PM, Justin Coyne 
> > wrote:
> >
> > > To be pedantic, Ruby and JavaScript are more Object Oriented than Java
> > > because they don't have primitives and (in Ruby's case) because classes
> > are
> > > themselves objects.   Unlike Java, both Python and Ruby can properly
> > > override of static methods on sub-classes. The Java language made many
> > > compromises as it was designed as a bridge to Object Oriented
> programming
> > > for programmers who were used to writing C and C++.
> > >
> > > -Justin
> > >
> >
>

Re: [CODE4LIB] Fuseki and other SPARQL servers

2013-02-21 Thread Ethan Gruber

Thanks everyone for the info. This soothed my apprehensions of running
Fuseki in a production environment.

Ethan


On Wed, Feb 20, 2013 at 4:05 PM, Ross Singer  wrote:

> I'll add that the LARQ plugin for Fuseki (which adds Lucene indexes) is
> pretty awesome, as well.
>
> -Ross.
>
> On Feb 20, 2013, at 3:57 PM, John Fereira  wrote:
>
> > If forgot about that.  That issue was created quite awhile ago and I
> hadn't check on it in a long time.  I've found that Jetty has worked fine
> in our production environment so far.  As I wrote earlier, I have it
> connecting to a jena SDB that is used for a semantic web application (VIVO)
> that was developed here.  Although we have the semantic web application
> running on a different server than the SDB database I found the performance
> was fairly significantly improved by having the Fuseki server running on
> the same machine as the SDB.
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Ethan Gruber
> > Sent: Wednesday, February 20, 2013 2:52 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] Fuseki and other SPARQL servers
> >
> > Hi Hugh,
> >
> > I have investigated the possibility of deploying Fuseki as a war in
> Tomcat (
> > https://issues.apache.org/jira/browse/JENA-201) because I wasn't sure
> how the default Jetty container would respond in production, but since you
> aren't having any problems with that deployment, I may go ahead and do that.
> >
> > Ethan
> >
> >
> > On Wed, Feb 20, 2013 at 2:39 PM, Hugh Cayless 
> wrote:
> >
> >> Hi Ethan!
> >>
> >> We've been using Jena/Fuseki in papyri.info for about a year now, iirc.
> >> We started with Mulgara, but switched. It's running in its own Jetty
> >> container in our system, but I've had no performance issues with it
> >> whatever.
> >>
> >> Best,
> >> Hugh
> >>
> >> On Feb 20, 2013, at 14:31 , Ethan Gruber  wrote:
> >>
> >>> Hi all,
> >>>
> >>> I have been playing around with Fuseki (
> >>> http://jena.apache.org/documentation/serving_data/index.html) for a
> >>> few months to get my feet wet with accessing and querying RDF.  I
> >>> quite like it. I find it well documented and easy to set up.  We
> >>> will soon deploy a SPARQL server in a production environment, and I
> >>> would like to know if others on the list have experience with Fuseki
> >>> in production, or have
> >> other
> >>> recommendations.  Mulgara is off the table as it inexplicably
> >>> conflicts with other apps installed in Tomcat.
> >>>
> >>> Thanks,
> >>> Ethan
> >>
>

Re: [CODE4LIB] Fuseki and other SPARQL servers

2013-02-20 Thread Ethan Gruber

TDB as per the startup instruction: "fuseki-server --loc=DB
/DatasetPathName"

Ethan


On Wed, Feb 20, 2013 at 3:02 PM, Ross Singer  wrote:

> On Feb 20, 2013, at 2:52 PM, Ethan Gruber  wrote:
>
> > Hi Hugh,
> >
> > I have investigated the possibility of deploying Fuseki as a war in
> Tomcat (
> > https://issues.apache.org/jira/browse/JENA-201) because I wasn't sure
> how
> > the default Jetty container would respond in production, but since you
> > aren't having any problems with that deployment, I may go ahead and do
> that.
>
> Fuseki/Jetty will have no problems scaling, it's what the Talis Platform
> used for large datasets.  I also ran a large dataset for quite a while with
> it.
>
> Which backend are you using?  TDB?  SDB?
>
> -Ross.
>
> >
> > Ethan
> >
> >
> > On Wed, Feb 20, 2013 at 2:39 PM, Hugh Cayless 
> wrote:
> >
> >> Hi Ethan!
> >>
> >> We've been using Jena/Fuseki in papyri.info for about a year now, iirc.
> >> We started with Mulgara, but switched. It's running in its own Jetty
> >> container in our system, but I've had no performance issues with it
> >> whatever.
> >>
> >> Best,
> >> Hugh
> >>
> >> On Feb 20, 2013, at 14:31 , Ethan Gruber  wrote:
> >>
> >>> Hi all,
> >>>
> >>> I have been playing around with Fuseki (
> >>> http://jena.apache.org/documentation/serving_data/index.html) for a
> few
> >>> months to get my feet wet with accessing and querying RDF.  I quite
> like
> >>> it. I find it well documented and easy to set up.  We will soon deploy
> a
> >>> SPARQL server in a production environment, and I would like to know if
> >>> others on the list have experience with Fuseki in production, or have
> >> other
> >>> recommendations.  Mulgara is off the table as it inexplicably conflicts
> >>> with other apps installed in Tomcat.
> >>>
> >>> Thanks,
> >>> Ethan
> >>
>

Re: [CODE4LIB] Fuseki and other SPARQL servers

2013-02-20 Thread Ethan Gruber

Hi Hugh,

I have investigated the possibility of deploying Fuseki as a war in Tomcat (
https://issues.apache.org/jira/browse/JENA-201) because I wasn't sure how
the default Jetty container would respond in production, but since you
aren't having any problems with that deployment, I may go ahead and do that.

Ethan


On Wed, Feb 20, 2013 at 2:39 PM, Hugh Cayless  wrote:

> Hi Ethan!
>
> We've been using Jena/Fuseki in papyri.info for about a year now, iirc.
> We started with Mulgara, but switched. It's running in its own Jetty
> container in our system, but I've had no performance issues with it
> whatever.
>
> Best,
> Hugh
>
> On Feb 20, 2013, at 14:31 , Ethan Gruber  wrote:
>
> > Hi all,
> >
> > I have been playing around with Fuseki (
> > http://jena.apache.org/documentation/serving_data/index.html) for a few
> > months to get my feet wet with accessing and querying RDF.  I quite like
> > it. I find it well documented and easy to set up.  We will soon deploy a
> > SPARQL server in a production environment, and I would like to know if
> > others on the list have experience with Fuseki in production, or have
> other
> > recommendations.  Mulgara is off the table as it inexplicably conflicts
> > with other apps installed in Tomcat.
> >
> > Thanks,
> > Ethan
>

[CODE4LIB] Fuseki and other SPARQL servers

2013-02-20 Thread Ethan Gruber

Hi all,

I have been playing around with Fuseki (
http://jena.apache.org/documentation/serving_data/index.html) for a few
months to get my feet wet with accessing and querying RDF.  I quite like
it. I find it well documented and easy to set up.  We will soon deploy a
SPARQL server in a production environment, and I would like to know if
others on the list have experience with Fuseki in production, or have other
recommendations.  Mulgara is off the table as it inexplicably conflicts
with other apps installed in Tomcat.

Thanks,
Ethan

Re: [CODE4LIB] GitHub Myths (was thanks and poetry)

2013-02-20 Thread Ethan Gruber

Wordpress?


On Wed, Feb 20, 2013 at 11:42 AM, Karen Coyle  wrote:

> Shaun, you cannot decide whether github is a barrier to entry FOR ME (or
> anyone else), any more than you can decide whether or not my foot hurts.
> I'm telling you github is NOT what I want to use. Period.
>
> I'm actually thinking that a blog format would be nice. It could be pretty
> (poetry and beauty go together). Poems tend to be short, so they'd make a
> nice blog post. They could appear in the Planet blog roll. They could be
> coded by author and topic. There could be comments! Even poems as comments!
> The only down-side is managing users. Anyone have ideas on that?
>
> kc
>
>
>
> On 2/20/13 8:20 AM, Shaun Ellis wrote:
>
>> > (As a general rule, for every programmer who prefers tool A, and says
>> > that everybody should use it, there’s a programmer who disparages tool
>> > A, and advocates tool B. So take what we say with a grain of salt!)
>>
>> It doesn't matter what tools you use, as long as you and your team are
>> able to participate easily, if you want to.  But if you want to attract
>>  contributions from a given development community, then choices should be
>> balanced between the preferences of that community and what best serve the
>> project.
>>
>> From what I've been hearing, I think there is a lot of confusion about
>> GitHub.  Heck, I am constantly learning about new GitHub features, APIs,
>> and best practices myself. But I find it to be an incredibly powerful
>> platform for moving open source, distributed software development forward.
>>  I am not telling anyone to use GitHub if they don't want to, but I want to
>> dispel a few myths I've heard recently:
>>
>> 
>>
>> * Myth #1 : GitHub creates a barrier to entry.
>> * "To contribute to a project on GitHub, you need to use the
>> command-line. It's not for non-coders."
>>
>> GitHub != git.  While GitHub was initially built for publishing and
>> sharing code via integration with git, all GitHub functionality can be
>> performed directly through the web gui.  In fact, GitHub can even be used
>> as your sole coding environment. There are other tools in the "eco-system"
>> that allow non-coders to contribute documentation, issue reporting, and
>> more to a project.
>>
>> 
>>
>> * Myth #2 : GitHub is for sharing/publishing code.
>> * "I would be fun to have a wiki for more durable poetry (github
>> unfortunately would be a barrier to many)."
>>
>> GitHub can be used to collaborate on and publish other types of content
>> as well.  For example, GitHub has a great wiki component* (as well as a
>> website component).  In a number of ways, has less of a "barrier to entry"
>> than our Code4Lib wiki.
>>
>> While the path of least resistance requires a "repository" to have a
>> wiki, public repos cost nothing and can consist of a simple "README" file.
>>  The wiki can be locked down to a team, or it can be writable by anyone
>> with a github account.  You don't need to do anything via command-line,
>> don't need to understand "git-flow", and you don't even need to learn wiki
>> markup to write content. All you need is an account and something to say,
>> just like any wiki. Log in, go to the anti-harassment policy wiki, and see
>> for yourself:
>> https://github.com/code4lib/**antiharassment-policy/wiki
>>
>> * The github wiki even has an API (via Gollum) that you can use to
>> retrieve raw or formatted wiki content, write new content, and collect
>> various meta data about the wiki as a whole:
>> https://github.com/code4lib/**antiharassment-policy/wiki/_**access
>>
>> 
>>
>> * Myth #3 : GitHub is person-centric.
>> > "(And as a further aside, there’s plenty to dislike about github as
>> > well, from it’s person-centric view of projects (rather than
>> > team-centric)..."
>>
>> Untrue. GitHub is very team centered when using organizational accounts,
>> which formalize authorization controls for projects, among other things:
>> https://github.com/blog/674-**introducing-organizations
>>
>> 
>>
>> * Myth #4 : GitHub is monopolizing open source software development.
>> > "... to its unfortunate centralizing of so much free/open
>> > source software on one platform.)"
>>
>> Convergence is not always a bad thing. GitHub provides a great, free
>> service with lots of helpful collaboration tools beyond version control.
>>  It's natural that people would flock there, despite having lots of other
>> options.
>>
>> 
>>
>> -Shaun
>>
>>
>>
>>
>>
>>
>>
>> On 2/19/13 5:35 PM, Erik Hetzner wrote:
>>
>>> At Sat, 16 Feb 2013 06:42:04 -0800,
>>> Karen Coyle wrote:
>>>

 gitHub may have excellent startup documentation, but that startup
 documentation describes git in programming terms mainly using *nx
 commands. If you have never had to use a versi

Re: [CODE4LIB] Getting started with Ruby and library-ish data (was RE: [CODE4LIB] You are a coder. So what am I?)

2013-02-18 Thread Ethan Gruber

The language you choose is somewhat dependent on the data you're working
with.  I don't find that Ruby or PHP are particularly good at dealing with
XML. They're passable for data manipulation and migration, but I wouldn't
use them to render large collections of structured XML data, like EAD or
TEI collections, or whatever.


Ethan


On Mon, Feb 18, 2013 at 8:52 AM, Jason Stirnaman wrote:

> This is a terribly distorted view of Ruby: "If you want to make web pages,
> learn Ruby", and you don't need to learn Rails to get the benefit of Ruby's
> awesomeness. But, everyone will have their own opinions. There's no
> accounting for taste.
>
> For anyone interested in learning to program and hack around with library
> data or linked data, here are some places to start (heavily biased toward
> the elegance of Ruby):
>
> http://wiki.code4lib.org/index.php/Working_with_MaRC
> https://delicious.com/jstirnaman/ruby+books
> https://delicious.com/jstirnaman/ruby+tutorials
> http://rdf.rubyforge.org/
>
> Jason
>
> Jason Stirnaman
> Digital Projects Librarian
> A.R. Dykes Library
> University of Kansas Medical Center
> 913-588-7319
>
> 
> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Joe
> Hourcle [onei...@grace.nascom.nasa.gov]
> Sent: Sunday, February 17, 2013 12:52 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] You *are* a coder. So what am I?
>
> On Feb 17, 2013, at 11:43 AM, John Fereira wrote:
>
> > I have been writing software "professionally" since around 1980 and
> first encounterd perl in the early 1990s of so and have *always* disliked
> it.   Last year I had to work on a project that was mostly developed in
> perl and it reminded me how much I disliked it.  As a utility language, and
> one that I think is good for beginning programmers (especially for those
> working in a library) I'd recommend PHP over perl every time.
>
> I'll agree that there are a few aspects of Perl that can be confusing, as
> some functions will change behavior depending on context, and there was a
> lot of bad code examples out there.*
>
> ... but I'd recommend almost any current mainstream language before
> recommending that someone learn PHP.
>
> If you're looking to make web pages, learn Ruby.
>
> If you're doing data cleanup, Perl if it's lots of text, Python if it's
> mostly numbers.
>
> I should also mention that in the early 1990s would have been Perl 4 ...
> and unfortunately, most people who learned Perl never learned Perl 5.  It's
> changed a lot over the years.  (just like PHP isn't nearly as insecure as
> it used to be ... and actually supports placeholders so you don't end up
> with SQL injections)
>
> -Joe
>

Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?

2012-11-01 Thread Ethan Gruber

Google is more useful than any reference book to find answers to
programming problems.
On Nov 1, 2012 4:25 PM, "Bohyun Kim"  wrote:

> Hi all code4lib-bers,
>
> As coders and coding librarians, what is ONE tool and/or resource that you
> recommend to newbie coders in a library (and why)?  I promise I will create
> and circulate the list and make it into a Code4Lib wiki page for collective
> wisdom.  =)
>
> Thanks in advance!
> Bohyun
>
> ---
> Bohyun Kim, MA, MSLIS
> Digital Access Librarian
> bohyun@fiu.edu
> 305-348-1471
> Medical Library, College of Medicine
> Florida International University
> http://medlib.fiu.edu
> http://medlib.fiu.edu/m (Mobile)
>

[CODE4LIB] Using dbpedia to generate EAC-CPF collections

2012-10-03 Thread Ethan Gruber

Hi all,

In the last few weeks, I have undertaken a project of EAC-CPF stubs using
dbpedia and VIAF data for the Roman emperors and their relations.  There's
a lot of great information available through dbpedia, and since it's
available in RDF, I put together a PHP script that can start at one point
in dbpedia (e.g., http://dbpedia.org/resource/Augustus) and traverse
through its relations to create a network of stubs using links to parents,
children, spouses, influences, successors, and predecessors provided in the
RDF.  Left unchecked, the script would crawl forward through the Byzantine
period to spread laterally (chronologically speaking) to generate a network
of the ruling hierarchy of the West up to the modern period.  It also goes
backwards to the successors of Alexander the Great.  For all I know, it
goes back through all of the Egyptian dynasties to Narmer ca. 3000 BC, but
I haven't let the script go that far.

The script is fairly generalizable, and can begin at any dbpedia resource.
It's available at
https://github.com/ewg118/xEAC/blob/master/misc/dbpedia-to-eac.php

I should also note that this is a work in progress.  To execute the script,
you'll need to place a "temp" folder in the same place you download/execute
it (for writing EAC records).

At a glance, here's what it does:

-Creates nameEntries for all of the names available in various languages in
dbpedia
-If a VIAF ID is available in the RDF, the script will pull some alternate
record IDs from VIAF, as well as birth and death dates
-Can pull in subjects, occupations, and related resources on the web
-Generate corporate/personal/family relations given the
parents/children/spouses/influences/successors/predecessors/dynasties
linked in dbpedia.  These relations are added into an array which
continually processes until presumably it reaches the end of time.
-You can specify an "end" record to attempt to break this chain, but I
cannot guarantee that it'll work.  Anastasius (emperor of Rome ca. 500 AD)
does actually successfully terminate the Augustus chain.
-Import birth and death places (and associated birth and death dates, if
available)

I think that these stubs are a good starting point for handing off the
management of EAC content to subject specialists who can add chronological
and geographical context.  I wrote a bit more about this script and the
process applied to xEAC, an XForms-based engine for creating, editing,
managing, and publishing EAC-CPF collections at
http://eaditor.blogspot.com/2012/10/using-dbpedia-to-jumpstart-eac-cpf.html

There's a prototype collection of the Roman Empire; if anyone is interested
in taking a look at it, drop me a line off the list.

Ethan

Re: [CODE4LIB] Displaying TGN terms

2012-09-17 Thread Ethan Gruber

I use Geonames for this sort of thing a lot.  With cities and
administrative divisions being offered in a machine-readable format, it's
pretty easy to encode places in a format that adheres to AACR2 or other
cataloging rules.  There are of course problems disambiguating city names
when no country is given, but I get a pretty accurate response in general:
probably greater than 76% when I have both the city and country or city and
geographic region.

Ethan

On Mon, Sep 17, 2012 at 3:16 PM, Eric Lease Morgan  wrote:

> On Sep 17, 2012, at 3:12 PM,  wrote:
>
> > But I'm having trouble coming up with an algorithm that can consistently
> spit these out in the form we'd want to display given the data available in
> TGN.
>
>
> A dense but rich, just-published article from D-Lib Magazine about
> geocoding -- Fulltext Geocoding Versus Spatial Metadata for Large Text
> Archives -- may give some guidance. From the conclusion:
>
>  Spatial information is playing an increasing role in the access
>  and mediation of information, driving interest in methods capable
>  of extracting spatial information from the textual contents of
>  large document archives. Automated approaches, even using fairly
>  basic algorithms, can achieve upwards of 76% accuracy when
>  recognizing, disambiguating, and converting to mappable
>  coordinates the references to individual cities and landmarks
>  buried deep within the text of a document. The workflow of a
>  typical geocoding system involves identifying potential
>  candidates from the text, checking those candidates for potential
>  matches in a gazetteer, and disambiguating and confirming those
>  candidates -- http://bit.ly/Ufl5k9
>
> --
> ELM
>

Re: [CODE4LIB] Timelines (was: visualize website)

2012-08-31 Thread Ethan Gruber

There's also timemap (SIMILE Timeline + mapping libraries like Google Maps
or OpenLayers) if you need to display geography in conjunction to
chronology.  http://code.google.com/p/timemap/

Ethan

On Fri, Aug 31, 2012 at 9:27 AM, Walter Lewis  wrote:

> On 2012-08-30, at 1:03 PM, miles stauffer wrote:
>
> > Is this what you are looking for?
> > http://selection.datavisualization.ch/
>
> The site points to TimelineJS at http://timeline.verite.co/ for timeline
> visualization.
> There is also the widget from the SIMILE project at MIT at
> http://www.simile-widgets.org/timeline/
>
> Are there other suggestions for tools for time line visualizations?
>
> Walter
>

Re: [CODE4LIB] Archival Software

2012-08-09 Thread Ethan Gruber

I find Omeka to be stronger in the area of collections publication and
exhibition than hardcore archival management due to the rather rudimentary
Dublin Core metadata foundation.  You can make other element sets, but it's
not a perfect solution.

Ethan

On Thu, Aug 9, 2012 at 2:57 PM, Kaile Zhu  wrote:

> How about Omeka?  Need to consider the library standards because
> eventually you will have to make your archival collection searchable.  -
> Kelly
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Lisa Gonzalez
> Sent: Thursday, August 09, 2012 1:38 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Archival Software
>
> Related to the CLIR Report, the wiki version is a little easier to
> navigate:
>
> http://archivalsoftware.pbworks.com/w/page/13600254/FrontPage
>
>
> Lisa Gonzalez
> Electronic Resources Librarian
> Catholic Theological Union
> 5401 S. Cornell Ave.
> Chicago, IL 60615
> 773-371-5463
> lgonza...@ctu.edu
>
>
>
>
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Nathan Tallman
> Sent: Thursday, August 09, 2012 12:00 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Archival Software
>
> As an archivist, this is still a very broad response.
>
> Are you looking to manage archival collections (accessioning, arrangement
> and description, producing finding aids, etc.)? If so, Archivists Toolkit
> or Archon may work for you. I'm not sure what you mean by university
> historical information, perhaps ready-reference type guides?
> There are a plethora of web options for this. Are you looking to manage
> digital assets? Then a digital repository, such as Fedora or Dspace is in
> order.
>
> Although it's a bit out of date at this point, you may want to look at
> Lisa Spiro's 2009 report, "Archival Management Software" <
> http://www.clir.org/pubs/reports/spiro/>. Also, check out Carol Bean's
> blog, BeanWorks. She has a post about comparing digital asset managers <
> http://beanworks.clbean.com/2010/05/creating-a-comparison-matrix/> (and
> also has useful related links).
>
> Best,
> Nathan
>
> On Thu, Aug 9, 2012 at 10:42 AM, Joselito Dela Cruz
> wrote:
>
> > We are looking to centralize the university historical information and
> > archives.
> >
> >
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> > Of Matthew Sherman
> > Sent: Thursday, August 09, 2012 10:38 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] Archival Software
> >
> > I think you need to provide a little more context as to what you are
> > trying to do.  The trouble is that the term archive is used in a
> > variety of different ways right now so we need to know what you mean
> > to be able to give you the best suggestions.
> >
> > On Thu, Aug 9, 2012 at 9:31 AM, Joselito Dela Cruz
> > wrote:
> >
> > > Any suggestions for inexpensive & easy to use archival software?
> > >
> > > Thanks,
> > >
> > > Jay Dela Cruz, MLIS
> > > Electronic Resources Librarian
> > > Hodges University | 2655 Northbrooke Drive, Naples, FL 34119-7932
> > > (239) 598-6211 | (800) 466-8017 x 6211 | f. (239) 598-6250
> > > jdelac...@hodges.edu | www.hodges.edu
> > >
> >
>
>
> **Bronze+Blue=Green** The University of Central Oklahoma is Bronze, Blue,
> and Green! Please print this e-mail only if absolutely necessary!
>
> **CONFIDENTIALITY** This e-mail (including any attachments) may contain
> confidential, proprietary and privileged information. Any unauthorized
> disclosure or use of this information is prohibited.
>

[CODE4LIB] Reminder: THATCamp for Computational Archaeology registration deadline is TODAY

2012-06-10 Thread Ethan Gruber

Today, June 10 is the final day to register for THATCamp CAA-NA, an
unconference for computer applications in archaeology.  The free event will
be held Friday, August 10 in the Harrison-Small Special Collections Library
of the University of Virginia, Charlottesville.  It is sponsored by the
Computer Applications and Quantitative Methods in Archaeology - North
America chapter, the University of Virginia Library *Year of Metadata *and
the Fiske Kimball Fine Arts Library.  This is a great opportunity to
interact with archaeologists, students, museum and library professionals,
and computer and information scientists operating within cultural heritage!

The general themes of the event are as follows:


   1. Simulating the Past
   2. Spatial Analysis
   3. Data Modelling & Sharing
   4. Data Analysis, Management, Integration & Visualisation
   5. Geospatial Technologies
   6. Field & Lab Recording
   7. Theoretical Approaches & Context of Archaeological Computing
   8. Human Computer Interaction, Multimedia, Museums

More info: http://caana2012.thatcamp.org/

Follow us on twitter at @THATCampCAANA or for email inquiries, use
thatcampca...@gmail.com

Ethan Gruber
American Numismatic Society

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Ethan Gruber

Saxon is really, really efficient with large files.  I don't really have
any benchmarks stats available, but I have gotten noticeably better
performance from Saxon/XSLT2 than PHP with DOMDocument or SimpleXML or
nokogiri and hpricot in Ruby.

Ethan

On Fri, Jun 8, 2012 at 2:36 PM, Kyle Banerjee wrote:

> I'm working on a script that needs to be able to crosswalk at least a
> couple hundred XML files regularly, some of which are quite large.
>
> I've thought of a number of ways to go about this, but I wanted to bounce
> this off the list since I'm sure people here deal with this problem all the
> time. My goal is to make something that's easy to read/maintain without
> pegging the CPU and consuming too much memory.
>
> The performance and load I'm seeing from running the files through LibXML
> and SimpleXML on the large files is completely unacceptable. SAX is not out
> of the question, but I'm trying to avoid it if possible to keep the code
> more compact and easier to read.
>
> I'm tempted to streamedit out all line breaks since they occur in
> unpredictable places and put new ones at the end of each record into a temp
> file. Then I can read the temp file one line at a time and process using
> SimpleXML. That way, there's no need to load giant files into memory,
> create huge arrays, etc and the code would be easy enough for a 6th grader
> to follow. My proposed method doesn't sound very efficient to me, but it
> should consume predictable resources which don't increase with file size.
>
> How do you guys deal with large XML files? Thanks,
>
> kyle
>
> Why the heck does the XML spec require a root element,
> particularly since large files usually consist of a large number of
> records/documents? This makes it absolutely impossible to process a file of
> any size without resorting to SAX or string parsing -- which takes away
> many of the advantages you'd normally have with an XML structure. 
>
> --
> --
> Kyle Banerjee
> Digital Services Program Manager
> Orbis Cascade Alliance
> baner...@orbiscascade.org / 503.999.9787
>

Re: [CODE4LIB] Studying the email list (Charcuterie Spectrum)

2012-06-05 Thread Ethan Gruber

The begs the question, what is the official Roy Tennant position on baloney
vs. bologna?  May I suggest a viaf-like resource for food, in which I may
prefer the baloney label while allowing my data to be cross-searchable with
bologna records?  Is there an RDF ontology for this???

On Tue, Jun 5, 2012 at 4:02 PM, Kevin S. Clarke  wrote:

> On Tue, Jun 5, 2012 at 3:55 PM, BWS Johnson 
> wrote:
>
> >>   Bacon   == Seal of Approval
> >>   Bologna == Seal of Disapproval
> >>   Salami  == Seal of No Approval Needed
> >>
> >
> > This has some serious flaws. I'm concerned about the relationships
> between the desirability of the bespoke seals as they relate to the appeal
> of the meats themselves. While yea, bacon is nearly universal in its
> appeal, that one seems on the mark. Alas, bologna as the seal of
> disapproval might fall a bit short. While one might jump to proffer spam in
> its place, Hawai'ians quite like spam, leaving us all in a bit of a
> quandry. Olive loaf, perhaps? And while salame is a most excellent meat,
> perhaps fois gras more aptly conveys the aboutness of not giving a damn
> about one's approval or lack thereof.
> >
> >  What say you cataloguing mafia? Surely we must honour the aboutness
> of meat and approval lest we needs OCLC to intervene more often than is
> strictly necessary in our mortal affairs.
>
> I'm vegan now, but having eaten it as a child, may I suggest chicken
> livers for the Seal of Disapproval? Blech!  And, as a vegan, I'd
> stretch bounds of the Seal of No Approval Needed to tempeh.  That
> seems appropriate.
>
> Fwiw...
> Kevin
>

[CODE4LIB] THATCamp for Computational Archaeology registration extended to June 10

2012-06-04 Thread Ethan Gruber

Dear all,

The registration deadline for THATCamp for Computational Archaeology has
been extended to June 10.  Registration is free and first-come, first
serve.  The THATCamp will be hosted August 10 at the University of Virginia
in Charlottesville.  It is co-sponsored by the Computer Applications and
Quantitative Methods in Archaeology - North America chapter and the U. Va.
Libraries Year of Metadata/Fiske Kimball Fine Arts Library.  For more
information, check out: http://caana2012.thatcamp.org/about-thatcampcaa-na/

This will be a great opportunity to meet new people and share ideas within
the realms of archaeology and technology.  You can follow us on twitter at
@THATCampCAANA.  Look forward to seeing you there!

Ethan Gruber
American Numismatic Society

Re: [CODE4LIB] triple stores ???

2012-05-29 Thread Ethan Gruber

For those using these big triplestores, how are you putting data in?  I'm
looking for a triplestore which supports SPARQL update.  Any comments
anyone can add on this interface will be useful.

Ethan
On May 29, 2012 4:12 PM, "Ravi Shankar"  wrote:

> Thanks, Stefano. The Europeana report seems to be quite comprehensive. It
> is funny that I've searched earlier for triple store comparisons with more
> explicit parameters 'rdf triple store comparison', and the Europeana report
> appeared in the third page of the search results. The 'triple' in the
> search seems to be the culprit -- a clear need for more semantics in the
> search engine ;)
>
> Cheers,
> Ravi
>
> On May 29, 2012, at 1:01 AM, Stefano Bargioni wrote:
>
> > Maybe a G search can help to find comparisons:
> >
> http://www.google.com/search?sugexp=chrome,mod=4&sourceid=chrome&ie=UTF-8&q=4store+Virtuoso+Jena+SDB++Mulgara
> > The result includes your post... added 8 minutes ago.
> > Stefano
> >
> > On 29/mag/2012, at 09.12, Ravi Shankar wrote:
> >
> >> We (DLSS at Stanford Libraries) are planning to use a triple store for
> storing and retrieving annotations (in RDF) on digital objects. We are
> currently looking at open-source triple stores such as 4store, Virtuoso,
> Jena SDB and Mulgara. Are you currently using a triple store or
> contemplating on using one? How would you evaluate 'your' triple store
> along the lines of 1) ease of setup, 2) scalability, 3) query performance,
> 3) bulk load performance, 4) access api, 5) documentation and 6) community
> support?
> >>
> >> Highly appreciate your thoughts, ideas and suggestions.
> >>
> >> Thanks,
> >> Ravi Shankar
> >>
> >
> >
> > __
> > Il tuo 5x1000 al Patronato di San Girolamo della Carita' e' un gesto
> semplice ma di grande valore.
> > Una tua firma aiutera' i sacerdoti ad essere piu' vicini alle esigenze
> di tutti noi.
> > Aiutaci a formare sacerdoti e seminaristi provenienti dai 5 continenti
> indicando nella dichiarazione dei redditi il codice fiscale 97023980580.
>

Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber

The 4-8 week deadline is more self-imposed than anything.  The plan is (or
was) to deploy the new version of this project by mid-late summer.  It is
already under way, with a working prototype, and I can probably mostly
finish it in 80-120 hours of solid work.  I want to deploy it as soon as we
can because other bigger, sexier projects depend on RDF delivered from this
project.  If it takes six months to completely rewrite this project for
node, or any non-java platform with which I have less experience, we've
thrown a monkey wrench into the development of our other projects.

As for triplestores:

Mulgara is on my list to check out, as is sesame.  Does mulgara support
SPARQL Update yet?  In theory, one should be able to post updates directly
from XForms into a triplestore which supports SPARQL Update.  Maybe this
warrants a separate thread.

On Tue, May 8, 2012 at 3:39 PM, Kevin Ford  wrote:

> > (and am
> > looking into a java triplestore to run in Tomcat)
> -- I don't know if the parenthetical was simply a statement or a
> solicitation - apologies if it was the former.
>
> Take a look at Mulgara.  Drops right into Tomcat.
>
> http://mulgara.org/
>
> --Kevin
>
>
>
>
> On 05/08/2012 02:01 PM, Ethan Gruber wrote:
>
>> For what it's worth, I have processed XML in PHP, Ruby, and Saxon/XSLT 2,
>> but I feel like I'm missing some sort of inside joke here.
>>
>> Thanks for the info.  To clarify, I don't develop in java, but deploy
>> well-established java-based apps in Tomcat, like Solr and eXist (and am
>> looking into a java triplestore to run in Tomcat) and write scripts to
>> make
>> these web services interact in whichever language seems to be the most
>> appropriate.  Node looks like it may be interesting to play around with,
>> but I'm wary of having to learn something completely new, jettisoning
>> every
>> application and language I am experienced with, to put a new project into
>> production in the next 4-8 weeks.
>>
>> Ethan
>>
>> On Tue, May 8, 2012 at 1:15 PM, Nate Vack  wrote:
>>
>>  On Tue, May 8, 2012 at 11:45 AM, Ross Singer
>>> wrote:
>>>
>>>> On May 8, 2012, at 10:17 AM, Ethan Gruber wrote:
>>>>
>>>>>
>>>>> in.  Our data is exclusively XML, so LAMP/Rails aren't really options.
>>>>>
>>>>
>>>> ^^ Really?  Nobody's going to take the bait with this one?
>>>>
>>>
>>> I can't see why they would; parsing XML in ruby is simply not possible.
>>>
>>> ;-)
>>>
>>> -n
>>>
>>>

Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber

I once had benchmarks comparing XML processing with Saxon/XSLT2 vs hpricot
and nokogiri, and Saxon is the most efficient XML processor there is.  I
don't have that data any more though, but that's why I'm not a proponent of
using PHP/Ruby for delivering and manipulating XML content.  Each platform
has its pros and cons.  I didn't mean to ruffle any feathers with that
statement.

On Tue, May 8, 2012 at 2:18 PM, Ross Singer  wrote:

> On May 8, 2012, at 2:01 PM, Ethan Gruber wrote:
>
> > For what it's worth, I have processed XML in PHP, Ruby, and Saxon/XSLT 2,
>
> So then explain why LAMP/Rails aren't really options.
>
> It's hard to see how anybody can recommend node.js (or any other stack)
> based on this statement because without knowing _why_ these are inadequate.
>  My guess is that node's XML libraries are also libXML based, just like
> pretty much any other C-based language.
>
> > but I feel like I'm missing some sort of inside joke here.
> >
> > Thanks for the info.  To clarify, I don't develop in java, but deploy
> > well-established java-based apps in Tomcat, like Solr and eXist (and am
> > looking into a java triplestore to run in Tomcat) and write scripts to
> make
> > these web services interact in whichever language seems to be the most
> > appropriate.  Node looks like it may be interesting to play around with,
> > but I'm wary of having to learn something completely new, jettisoning
> every
> > application and language I am experienced with, to put a new project into
> > production in the next 4-8 weeks.
>
> Eh, if your window is 4-8 weeks, then I wouldn't be considering node for
> this project.  It does, however, sound like you could really use a new
> project manager, because the one you have sounds terrible.
>
> -Ross.
>
> >
> > Ethan
> >
> > On Tue, May 8, 2012 at 1:15 PM, Nate Vack  wrote:
> >
> >> On Tue, May 8, 2012 at 11:45 AM, Ross Singer 
> >> wrote:
> >>> On May 8, 2012, at 10:17 AM, Ethan Gruber wrote:
> >>>>
> >>>> in.  Our data is exclusively XML, so LAMP/Rails aren't really options.
> >>>
> >>> ^^ Really?  Nobody's going to take the bait with this one?
> >>
> >> I can't see why they would; parsing XML in ruby is simply not possible.
> >>
> >> ;-)
> >>
> >> -n
> >>
>

Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber

For what it's worth, I have processed XML in PHP, Ruby, and Saxon/XSLT 2,
but I feel like I'm missing some sort of inside joke here.

Thanks for the info.  To clarify, I don't develop in java, but deploy
well-established java-based apps in Tomcat, like Solr and eXist (and am
looking into a java triplestore to run in Tomcat) and write scripts to make
these web services interact in whichever language seems to be the most
appropriate.  Node looks like it may be interesting to play around with,
but I'm wary of having to learn something completely new, jettisoning every
application and language I am experienced with, to put a new project into
production in the next 4-8 weeks.

Ethan

On Tue, May 8, 2012 at 1:15 PM, Nate Vack  wrote:

> On Tue, May 8, 2012 at 11:45 AM, Ross Singer 
> wrote:
> > On May 8, 2012, at 10:17 AM, Ethan Gruber wrote:
> >>
> >> in.  Our data is exclusively XML, so LAMP/Rails aren't really options.
> >
> > ^^ Really?  Nobody's going to take the bait with this one?
>
> I can't see why they would; parsing XML in ruby is simply not possible.
>
> ;-)
>
> -n
>

Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber

Thanks, it really helps to get a list of projects using it so I can get a
better sense of what's possible.

On Tue, May 8, 2012 at 10:23 AM, Cary Gordon  wrote:

> I have done some work with node building apps in the areas of mapping
> and communication (chat, etc.).
>
> Looking at the list at
>
> https://github.com/joyent/node/wiki/Projects,-Applications,-and-Companies-Using-Node
> ,
> the emphasis on real-time stands out.
>
> Node is fast and lightweight, and is well suited to applications that
> need speed and can take advantage of multiple channels.
>
> Thanks,
>
> Cary
>
> On Mon, May 7, 2012 at 8:17 PM, Ethan Gruber  wrote:
> > Hi all,
> >
> > It was recently suggested to me that a project I am working on may adopt
> > node.js for its architecture (well, be completely re-written for
> node.js).
> > I don't know anything about node.js, and have only heard of it in some
> > passing discussions on the list.  I'd like to know if anyone on code4lib
> > has experience developing in this platform, and what their thoughts are
> on
> > it, positive or negative.
> >
> > Thanks,
> > Ethan
>
>
>
> --
> Cary Gordon
> The Cherry Hill Company
> http://chillco.com
>

Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Ethan Gruber

Thanks.  I have been working on a system that allows editing of RDF in web
forms, creating linked data connections in the background, publishing to
eXist and Solr for dissemination, and will eventually integrate operation
with an RDF triplestore/SPARQL, all with Tomcat apps.  I'm not sure it is
possible to create, manage, and deliver our content with node.js, but I was
told by the project manager that Apache, Java, and Tomcat were "showing
signs of age."  I'm not so sure about this considering the prevalence of
Tomcat apps both in libraries and industry.  I happen to be very fond of
Solr, and it seems very risky to start over in node.js, especially since I
can't be certain the end product will succeed.  I prefer to err on the side
of stability.

If anyone has other thoughts about the future of Tomcat applications in the
library, or more broadly cultural heritage informatics, feel free to jump
in.  Our data is exclusively XML, so LAMP/Rails aren't really options.

Ethan

On Tue, May 8, 2012 at 10:03 AM, Nate Vack  wrote:

> On Mon, May 7, 2012 at 10:17 PM, Ethan Gruber  wrote:
>
> > It was recently suggested to me that a project I am working on may adopt
> > node.js for its architecture (well, be completely re-written for
> node.js).
> > I don't know anything about node.js, and have only heard of it in some
> > passing discussions on the list.  I'd like to know if anyone on code4lib
> > has experience developing in this platform, and what their thoughts are
> on
> > it, positive or negative.
>
> I've only played a little bit, but my take is: you'll have more parts
> to build than with other systems. If you need persistent connections,
> it's gonna be neat; if you don't, it's probably not worth the bother.
>
> The Peepcode screencasts on Node:
>
> https://peepcode.com/screencasts/node
>
> are probably worth your time and money.
>
> -n
>

[CODE4LIB] Anyone using node.js?

2012-05-07 Thread Ethan Gruber

Hi all,

It was recently suggested to me that a project I am working on may adopt
node.js for its architecture (well, be completely re-written for node.js).
I don't know anything about node.js, and have only heard of it in some
passing discussions on the list.  I'd like to know if anyone on code4lib
has experience developing in this platform, and what their thoughts are on
it, positive or negative.

Thanks,
Ethan

Re: [CODE4LIB] Omeka and CoSign

2012-04-20 Thread Ethan Gruber

Hi Ken,

You may get a response here, but the Omeka Google Group community offers
really great support.  I'd ask there as well.

Ethan

On Fri, Apr 20, 2012 at 12:30 PM, Varnum, Ken  wrote:

> We're hoping to use our campus CoSign authentication system with Omeka,
> allowing campus users to log in with our campus single sign-on and (where
> appropriate permissions have been granted to that user ID in Omeka) getting
> the user to the admin pages, bypassing the Omeka login screen. Has anyone
> done this? If so, could you lend us some advice (or code)?
>
> Ken
>
>
> --
> Ken Varnum
> Web Systems Manager   E: var...@umich.edu
> University of Michigan LibraryT: 734-615-3287
> 300C Hatcher Graduate Library F: 734-647-6897
> Ann Arbor, MI 48109-1190
> http://www.lib.umich.edu/users/varnum
>

[CODE4LIB] Representing geographic hiearchy in linked data

2012-04-18 Thread Ethan Gruber

<<< No Message Collected >>>

Re: [CODE4LIB] Author authority records to create publication feed?

2012-04-13 Thread Ethan Gruber

It appears that academia.edu still does not have an Atom/RSS feed for
member activity and listed publications, but I think such a feature would
be very useful.  If there was a concerted effort to demand such a service,
academia.edu might consider implementing it.

Ethan

On Fri, Apr 13, 2012 at 9:25 AM, Paul Butler (pbutler3) wrote:

> Howdy All,
>
> Some folks from across campus just came to my door with this question.  I
> am still trying to work through the possibilities and problems, but thought
> others might have encountered something similar.
>
> They are looking for a way to create a feed (RSS, or anything else that
> might work) for each faculty member on campus to collect and link to their
> publications, which can then be embedded into their faculty profile webpage
> (in WordPress).
>
> I realize the vendors (JSTOR, EBSCO, etc.) allow author RSS feeds, but
> that really does not allow for disambiguation between folks with the same
> name and variants in name citation.  It appears Web of Science has author
> authority records and a set of apis, but we currently do not subscribe to
> WoS and am waiting for a trial to test.  What we need is something similar
> to this: http://arxiv.org/help/author_identifiers
>
> We can ask faculty members to upload their own citations and then just
> auto link out to something like Serials Solutions' Journal Finder,  but
> that is likely not sustainable.
>
> So, any suggestions - particularly free or low cost solutions.  Thanks!
>
> Cheers, Paul
> +-+-+-+-+-+-+-+-+-+-+-+-+
> Paul R Butler
> Assistant Systems Librarian
> Simpson Library
> University of Mary Washington
> 1801 College Avenue
> Fredericksburg, VA 22401
> 540.654.1756
> libraries.umw.edu
>
> Sent from the mighty Dell Vostro 230.
>

Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-11 Thread Ethan Gruber

Thanks to everyone for the suggestions.

Ethan

On Tue, Apr 10, 2012 at 7:43 PM, Simon Spero  wrote:

> On Mon, Apr 9, 2012 at 7:13 PM, Ethan Gruber  wrote:
>
> > Ancient geographic entities.  Athens is in Attica.  Sardis is in Lydia
> (in
> > Anatolia, for example).  If these were modern geopolitical entities, I
> > would use geonames.  We're linking cities to Pleiades, but Pleiades does
> > not maintain parent::child geographic relationships.
>
>
> geoPoliticalSubdivision may work for you. You could assert this as a
> subPropertyOf ObjectInverseOf(partOf), since BIG is a
> holonym<http://en.wikipedia.org/wiki/Holonymy>of SMALL. Also, it is
> probably a bad idea to use partOf if there is a more
> specific sub-property that you can use that will better capture the
> intended  meaning - for example, components of a kit, or series in a fonds.
>
> http://sw.opencyc.org/concept/Mx4rvfGaTZwpEbGdrcN5Y29ycA
>
> "(geopoliticalSubdivision BIG SMALL) means that
> the GeopoliticalEntity SMALL is a part of the
> larger GeopoliticalEntity BIG. The territory (see the constant TerritoryFn)
> of SMALL is a geographical sub-region (see the
> predicate geographicalSubRegions) of the territory of BIG. The government
> (see the constant GovernmentFn) of BIG usually has some sovereignty over
> the government of SMALL."
>
> Simon
>

Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-09 Thread Ethan Gruber

Ancient geographic entities.  Athens is in Attica.  Sardis is in Lydia (in
Anatolia, for example).  If these were modern geopolitical entities, I
would use geonames.  We're linking cities to Pleiades, but Pleiades does
not maintain parent::child geographic relationships.

Ethan
On Apr 9, 2012 5:53 PM, "Simon Spero"  wrote:

> Are you talking about geographical entities, or geopolitical ones? For
> example,  is there an answer to the question "what country is
> constantinople located in?"
>
> Simon
> On Apr 8, 2012 8:02 PM, "Ethan Gruber"  wrote:
>
> > CIDOC-CRM may be the answer here. I will look over the documentation in
> > greater detail tomorrow.
> >
> > Thanks,
> > Ethan
> > On Apr 8, 2012 7:56 PM, "Ethan Gruber"  wrote:
> >
> > > The data is modeled, but I want to use an ontology for geographic
> > concepts
> > > that already exists, if possible.  If anything, my issue highlights the
> > > point that linked data can be *too* flexible.
> > > On Apr 8, 2012 3:54 PM, "Michael Hopwood"  wrote:
> > >
> > >> I think this highlights the point that, at some point, you have to
> model
> > >> the data.
> > >>
> > >> -Original Message-
> > >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> Of
> > >> Ethan Gruber
> > >> Sent: 08 April 2012 15:44
> > >> To: CODE4LIB@LISTSERV.ND.EDU
> > >> Subject: Re: [CODE4LIB] Representing geographic hiearchy in linked
> data
> > >>
> > >> Hi,
> > >>
> > >> Thanks for the info, but it's not quite what I'm looking for.  We've
> > >> established authority control for ancient places, but I'm looking for
> an
> > >> ontology I can use to describe the child:parent relationship between
> > city
> > >> and region or region and larger region (in any way that isn't
> > >> dcterms:partOf).  Geonames has defined their own vocabulary that can't
> > >> really be reused in other geographic contexts, e.g. with
> gn:countryCode,
> > >> gn:parentCountry.
> > >>
> > >> Thanks,
> > >> Ethan
> > >>
> > >> On Fri, Apr 6, 2012 at 11:40 AM, Karen Coyle 
> wrote:
> > >>
> > >> > Also, there is Geonames (http://www.geonames.org), which is the
> > >> > primary geographic data set on the Semantic Web. Here is the link to
> > >> Athens:
> > >> >
> > >> > http://www.geonames.org/**search.html?q=athens&country=**GR<
> > http://www
> > >> > .geonames.org/search.html?q=athens&country=GR>
> > >> >
> > >> > kc
> > >> >
> > >> >
> > >> > On 4/6/12 4:54 PM, Karen Miller wrote:
> > >> >
> > >> >> Ethan, have you considered Getty's Thesaurus of Geographic Names?
>  It
> > >> >> does provide a geographic hierarchy, although the data for Athens
> > >> >> they provide isn't quite the one you've described:
> > >> >>
> > >> >> http://www.getty.edu/vow/**TGNHierarchy?find=athens&**
> > >> >> place=&nation=&prev_page=1&**english=Y&subjectid=7001393<
> > http://www.g
> > >> >>
> > etty.edu/vow/TGNHierarchy?find=athens&place=&nation=&prev_page=1&engl
> > >> >> ish=Y&subjectid=7001393>
> > >> >>
> > >> >> This vocabulary is available in XML here:
> > >> >>
> > >> >>
> > http://www.getty.edu/research/**tools/vocabularies/obtain/**index.htm
> > >> >> l<
> http://www.getty.edu/research/tools/vocabularies/obtain/index.html
> > >
> > >> >>
> > >> >> I have looked at it but not used it; it's a big tangled mess of
> XML.
> > >> >>
> > >> >> MODS mimics a hierarchy (the subject/hierarchicalGeographic element
> > >> >> has these children: continent, country, province, region, state,
> > >> >> territory, county, city, island, area, extraterrestrialArea,
> > >> >> citySection). The VRA Core location element provides a similar
> > mapping.
> > >> >>
> > >> >> I try to stay away from Dublin Core, but I did venture onto the DC
> > >> >> Terms page just now and saw TGN listed in the vo

Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-08 Thread Ethan Gruber

CIDOC-CRM may be the answer here. I will look over the documentation in
greater detail tomorrow.

Thanks,
Ethan
On Apr 8, 2012 7:56 PM, "Ethan Gruber"  wrote:

> The data is modeled, but I want to use an ontology for geographic concepts
> that already exists, if possible.  If anything, my issue highlights the
> point that linked data can be *too* flexible.
> On Apr 8, 2012 3:54 PM, "Michael Hopwood"  wrote:
>
>> I think this highlights the point that, at some point, you have to model
>> the data.
>>
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>> Ethan Gruber
>> Sent: 08 April 2012 15:44
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] Representing geographic hiearchy in linked data
>>
>> Hi,
>>
>> Thanks for the info, but it's not quite what I'm looking for.  We've
>> established authority control for ancient places, but I'm looking for an
>> ontology I can use to describe the child:parent relationship between city
>> and region or region and larger region (in any way that isn't
>> dcterms:partOf).  Geonames has defined their own vocabulary that can't
>> really be reused in other geographic contexts, e.g. with gn:countryCode,
>> gn:parentCountry.
>>
>> Thanks,
>> Ethan
>>
>> On Fri, Apr 6, 2012 at 11:40 AM, Karen Coyle  wrote:
>>
>> > Also, there is Geonames (http://www.geonames.org), which is the
>> > primary geographic data set on the Semantic Web. Here is the link to
>> Athens:
>> >
>> > http://www.geonames.org/**search.html?q=athens&country=**GR<http://www
>> > .geonames.org/search.html?q=athens&country=GR>
>> >
>> > kc
>> >
>> >
>> > On 4/6/12 4:54 PM, Karen Miller wrote:
>> >
>> >> Ethan, have you considered Getty's Thesaurus of Geographic Names?  It
>> >> does provide a geographic hierarchy, although the data for Athens
>> >> they provide isn't quite the one you've described:
>> >>
>> >> http://www.getty.edu/vow/**TGNHierarchy?find=athens&**
>> >> place=&nation=&prev_page=1&**english=Y&subjectid=7001393<http://www.g
>> >> etty.edu/vow/TGNHierarchy?find=athens&place=&nation=&prev_page=1&engl
>> >> ish=Y&subjectid=7001393>
>> >>
>> >> This vocabulary is available in XML here:
>> >>
>> >> http://www.getty.edu/research/**tools/vocabularies/obtain/**index.htm
>> >> l<http://www.getty.edu/research/tools/vocabularies/obtain/index.html>
>> >>
>> >> I have looked at it but not used it; it's a big tangled mess of XML.
>> >>
>> >> MODS mimics a hierarchy (the subject/hierarchicalGeographic element
>> >> has these children: continent, country, province, region, state,
>> >> territory, county, city, island, area, extraterrestrialArea,
>> >> citySection). The VRA Core location element provides a similar mapping.
>> >>
>> >> I try to stay away from Dublin Core, but I did venture onto the DC
>> >> Terms page just now and saw TGN listed in the vocabulary encoding
>> >> schemes there, so probably someone has implemented it.
>> >>
>> >> Karen
>> >>
>> >>
>> >> Karen D. Miller
>> >> Monographic/Digital Projects Cataloger Bibliographic Services Dept.
>> >> Northwestern University Library
>> >> Evanston, IL
>> >> k-mill...@northwestern.edu
>> >> 847-467-3462
>> >>
>> >>
>> >>
>> >>
>> >> -Original Message-
>> >> From: Code for Libraries [mailto:code4...@listserv.nd.**EDU<
>> CODE4LIB@LISTSERV.ND.EDU>]
>> >> On Behalf Of Ethan Gruber
>> >> Sent: Thursday, April 05, 2012 12:49 PM
>> >> To: CODE4LIB@LISTSERV.ND.EDU
>> >> Subject: [CODE4LIB] Representing geographic hiearchy in linked data
>> >>
>> >> Hi all,
>> >>
>> >> I have a dilemma that needs to be sorted out.  I'm looking for an
>> >> ontology that can describe geographic hierarchy, and hopefully someone
>> on
>> >> the list has experience with this.  For example, if I have an RDF
>> record
>> >> that describes Athens, I want to point Athens to Attica, and Attica to
>> >> Greece, and so on.  The current proposal is to use dcterms:partOf, but
>> the
>> >> problem with this is that our records will also use dcterms:partOf to
>> >> describe a completely different type of relational concept, and it
>> will be
>> >> almost impossible for scripts to recognize the difference between
>> these two
>> >> uses of the same DC term.
>> >>
>> >> Thanks,
>> >> Ethan
>> >>
>> >
>> > --
>> > Karen Coyle
>> > kco...@kcoyle.net http://kcoyle.net
>> > ph: 1-510-540-7596
>> > m: 1-510-435-8234
>> > skype: kcoylenet
>> >
>>
>

Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-08 Thread Ethan Gruber

The data is modeled, but I want to use an ontology for geographic concepts
that already exists, if possible.  If anything, my issue highlights the
point that linked data can be *too* flexible.
On Apr 8, 2012 3:54 PM, "Michael Hopwood"  wrote:

> I think this highlights the point that, at some point, you have to model
> the data.
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Ethan Gruber
> Sent: 08 April 2012 15:44
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Representing geographic hiearchy in linked data
>
> Hi,
>
> Thanks for the info, but it's not quite what I'm looking for.  We've
> established authority control for ancient places, but I'm looking for an
> ontology I can use to describe the child:parent relationship between city
> and region or region and larger region (in any way that isn't
> dcterms:partOf).  Geonames has defined their own vocabulary that can't
> really be reused in other geographic contexts, e.g. with gn:countryCode,
> gn:parentCountry.
>
> Thanks,
> Ethan
>
> On Fri, Apr 6, 2012 at 11:40 AM, Karen Coyle  wrote:
>
> > Also, there is Geonames (http://www.geonames.org), which is the
> > primary geographic data set on the Semantic Web. Here is the link to
> Athens:
> >
> > http://www.geonames.org/**search.html?q=athens&country=**GR<http://www
> > .geonames.org/search.html?q=athens&country=GR>
> >
> > kc
> >
> >
> > On 4/6/12 4:54 PM, Karen Miller wrote:
> >
> >> Ethan, have you considered Getty's Thesaurus of Geographic Names?  It
> >> does provide a geographic hierarchy, although the data for Athens
> >> they provide isn't quite the one you've described:
> >>
> >> http://www.getty.edu/vow/**TGNHierarchy?find=athens&**
> >> place=&nation=&prev_page=1&**english=Y&subjectid=7001393<http://www.g
> >> etty.edu/vow/TGNHierarchy?find=athens&place=&nation=&prev_page=1&engl
> >> ish=Y&subjectid=7001393>
> >>
> >> This vocabulary is available in XML here:
> >>
> >> http://www.getty.edu/research/**tools/vocabularies/obtain/**index.htm
> >> l<http://www.getty.edu/research/tools/vocabularies/obtain/index.html>
> >>
> >> I have looked at it but not used it; it's a big tangled mess of XML.
> >>
> >> MODS mimics a hierarchy (the subject/hierarchicalGeographic element
> >> has these children: continent, country, province, region, state,
> >> territory, county, city, island, area, extraterrestrialArea,
> >> citySection). The VRA Core location element provides a similar mapping.
> >>
> >> I try to stay away from Dublin Core, but I did venture onto the DC
> >> Terms page just now and saw TGN listed in the vocabulary encoding
> >> schemes there, so probably someone has implemented it.
> >>
> >> Karen
> >>
> >>
> >> Karen D. Miller
> >> Monographic/Digital Projects Cataloger Bibliographic Services Dept.
> >> Northwestern University Library
> >> Evanston, IL
> >> k-mill...@northwestern.edu
> >> 847-467-3462
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: Code for Libraries [mailto:code4...@listserv.nd.**EDU<
> CODE4LIB@LISTSERV.ND.EDU>]
> >> On Behalf Of Ethan Gruber
> >> Sent: Thursday, April 05, 2012 12:49 PM
> >> To: CODE4LIB@LISTSERV.ND.EDU
> >> Subject: [CODE4LIB] Representing geographic hiearchy in linked data
> >>
> >> Hi all,
> >>
> >> I have a dilemma that needs to be sorted out.  I'm looking for an
> >> ontology that can describe geographic hierarchy, and hopefully someone
> on
> >> the list has experience with this.  For example, if I have an RDF record
> >> that describes Athens, I want to point Athens to Attica, and Attica to
> >> Greece, and so on.  The current proposal is to use dcterms:partOf, but
> the
> >> problem with this is that our records will also use dcterms:partOf to
> >> describe a completely different type of relational concept, and it will
> be
> >> almost impossible for scripts to recognize the difference between these
> two
> >> uses of the same DC term.
> >>
> >> Thanks,
> >> Ethan
> >>
> >
> > --
> > Karen Coyle
> > kco...@kcoyle.net http://kcoyle.net
> > ph: 1-510-540-7596
> > m: 1-510-435-8234
> > skype: kcoylenet
> >
>

Re: [CODE4LIB] Representing geographic hiearchy in linked data

2012-04-08 Thread Ethan Gruber

Hi,

Thanks for the info, but it's not quite what I'm looking for.  We've
established authority control for ancient places, but I'm looking for an
ontology I can use to describe the child:parent relationship between city
and region or region and larger region (in any way that isn't
dcterms:partOf).  Geonames has defined their own vocabulary that can't
really be reused in other geographic contexts, e.g. with gn:countryCode,
gn:parentCountry.

Thanks,
Ethan

On Fri, Apr 6, 2012 at 11:40 AM, Karen Coyle  wrote:

> Also, there is Geonames (http://www.geonames.org), which is the primary
> geographic data set on the Semantic Web. Here is the link to Athens:
>
> http://www.geonames.org/**search.html?q=athens&country=**GR<http://www.geonames.org/search.html?q=athens&country=GR>
>
> kc
>
>
> On 4/6/12 4:54 PM, Karen Miller wrote:
>
>> Ethan, have you considered Getty's Thesaurus of Geographic Names?  It
>> does provide a geographic hierarchy, although the data for Athens they
>> provide isn't quite the one you've described:
>>
>> http://www.getty.edu/vow/**TGNHierarchy?find=athens&**
>> place=&nation=&prev_page=1&**english=Y&subjectid=7001393<http://www.getty.edu/vow/TGNHierarchy?find=athens&place=&nation=&prev_page=1&english=Y&subjectid=7001393>
>>
>> This vocabulary is available in XML here:
>>
>> http://www.getty.edu/research/**tools/vocabularies/obtain/**index.html<http://www.getty.edu/research/tools/vocabularies/obtain/index.html>
>>
>> I have looked at it but not used it; it's a big tangled mess of XML.
>>
>> MODS mimics a hierarchy (the subject/hierarchicalGeographic element has
>> these children: continent, country, province, region, state, territory,
>> county, city, island, area, extraterrestrialArea, citySection). The VRA
>> Core location element provides a similar mapping.
>>
>> I try to stay away from Dublin Core, but I did venture onto the DC Terms
>> page just now and saw TGN listed in the vocabulary encoding schemes there,
>> so probably someone has implemented it.
>>
>> Karen
>>
>>
>> Karen D. Miller
>> Monographic/Digital Projects Cataloger
>> Bibliographic Services Dept.
>> Northwestern University Library
>> Evanston, IL
>> k-mill...@northwestern.edu
>> 847-467-3462
>>
>>
>>
>>
>> -Original Message-
>> From: Code for Libraries 
>> [mailto:code4...@listserv.nd.**EDU]
>> On Behalf Of Ethan Gruber
>> Sent: Thursday, April 05, 2012 12:49 PM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] Representing geographic hiearchy in linked data
>>
>> Hi all,
>>
>> I have a dilemma that needs to be sorted out.  I'm looking for an
>> ontology that can describe geographic hierarchy, and hopefully someone on
>> the list has experience with this.  For example, if I have an RDF record
>> that describes Athens, I want to point Athens to Attica, and Attica to
>> Greece, and so on.  The current proposal is to use dcterms:partOf, but the
>> problem with this is that our records will also use dcterms:partOf to
>> describe a completely different type of relational concept, and it will be
>> almost impossible for scripts to recognize the difference between these two
>> uses of the same DC term.
>>
>> Thanks,
>> Ethan
>>
>
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
>

Re: [CODE4LIB] RDF advice

2012-02-14 Thread Ethan Gruber

Hi Karen,

Thanks.  Would it be odd to use foaf:primaryTopic when FOAF isn't used to
describe other attributes of a concept?

Ethan

On Mon, Feb 13, 2012 at 5:59 PM, Karen Coyle  wrote:

> On 2/13/12 1:43 PM, Ethan Gruber wrote:
>
>> Hi Patrick,
>>
>> Thanks.  That does make sense.  Hopefully others will weigh in with
>> agreement (or disagreement).  Sometimes these semantic languages are so
>> flexible that it's unsettling.  There are a million ways to do something
>> with only de facto standards rather than restricted schemas.  For what
>> it's
>> worth, the metadata files describe coin-types, an intellectual concept in
>> numismatics succinctly described at
>> http://coins.about.com/od/**coinsglossary/g/coin_type.htm<http://coins.about.com/od/coinsglossary/g/coin_type.htm>,
>> not physical
>> objects in a collection.
>>
>
> I believe this is similar to what FOAF does with "primary topic":
> http://xmlns.com/foaf/spec/#**term_primaryTopic<http://xmlns.com/foaf/spec/#term_primaryTopic>
>
> In FOAF that usually points to a web page ABOUT the subject of the FOAF
> data, so a wikipedia web page about Stephen King would get this "primary
> topic" property. Presuming that your XML is http:// accessible, it might
> fit into this model.
>
> kc
>
>
>> Ethan
>>
>> On Mon, Feb 13, 2012 at 4:28 PM, Patrick Murray-John<
>> patrickmjc...@gmail.com>  wrote:
>>
>>  Ethan,
>>>
>>> The semantics do seem odd there. It doesn't seem like a skos:Concept
>>> would
>>> typically link to a metadata record about -- if I'm following you right
>>> --
>>> a specific coin. Is this sort of a FRBRish approach, where your
>>> skos:Concept is similar to the abstraction of a frbr:Work (that is, the
>>> idea of a particular coin), where your metadata records are really
>>> describing the common features of a particular coin?
>>>
>>> If that's close, it seems like the richer metadata is really a sort of
>>> definition of the skos:Concept, so maybe skos:definition would do the
>>> trick? Something like this:
>>>
>>> ex:wheatPenny a skos:Concept ;
>>>skos:prefLabel "Wheat Penny" ;
>>>skos:definition "Your richer, non RDF metadata document describing the
>>> front and back, years minted, etc."
>>>
>>> In XML that might be like:
>>>
>>> >> about="http://example.org/wheatPenny<http://example.org/**wheatPenny>
>>> <http://example.org/**wheatPenny <http://example.org/wheatPenny>>
>>>
>>> ">
>>>  Wheat Penny
>>>  
>>> Your richer, non RDF metadata document describing the front and back,
>>> years minted, etc.
>>>  
>>>  
>>>
>>>
>>> It might raise an eyebrow to have, instead of a literal value for
>>> skos:definition, another set of structured, non RDF metadata. Better in
>>> that case to go with a document reference, and make your richer metadata
>>> a
>>> standalone document with its own URI:
>>>
>>> ex:wheatPenny skos:definition ex:wheatPennyDefinition**.xml
>>>
>>> >> about="http://example.org/wheatPenny<http://example.org/**wheatPenny>
>>> <http://example.org/**wheatPenny <http://example.org/wheatPenny>>
>>> ">
>>> >> resource="http://example.org/wheatPenny.xml<http://example.org/**wheatPenny.xml>
>>> <http://**example.org/wheatPenny.xml <http://example.org/wheatPenny.xml>
>>> >"
>>>
>>> />
>>> 
>>>
>>> I'm looking at the Documentation as a Document Reference section in SKOS
>>> Primer : 
>>> http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/<http://www.w3.org/TR/2009/**NOTE-skos-primer-20090818/>
>>> http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/>
>>> >
>>>
>>>
>>> Again, if I'm following, that might be the closest approach.
>>>
>>> Hope that helps,
>>> Patrick
>>>
>>>
>>>
>>> On 02/11/2012 09:53 PM, Ethan Gruber wrote:
>>>
>>>  Hi Patrick,
>>>>
>>>> The richer metadata model is an ontology for describing coins.  It is
>>>> more
>>>> complex than, say, VRA Core or MODS, but not as hierarchically
>>>> complicated
>>>> as an EAD finding aid.  I'd like to link a sko

Re: [CODE4LIB] RDF advice

2012-02-13 Thread Ethan Gruber

Hi Patrick,

Thanks.  That does make sense.  Hopefully others will weigh in with
agreement (or disagreement).  Sometimes these semantic languages are so
flexible that it's unsettling.  There are a million ways to do something
with only de facto standards rather than restricted schemas.  For what it's
worth, the metadata files describe coin-types, an intellectual concept in
numismatics succinctly described at
http://coins.about.com/od/coinsglossary/g/coin_type.htm, not physical
objects in a collection.

Ethan

On Mon, Feb 13, 2012 at 4:28 PM, Patrick Murray-John <
patrickmjc...@gmail.com> wrote:

> Ethan,
>
> The semantics do seem odd there. It doesn't seem like a skos:Concept would
> typically link to a metadata record about -- if I'm following you right --
> a specific coin. Is this sort of a FRBRish approach, where your
> skos:Concept is similar to the abstraction of a frbr:Work (that is, the
> idea of a particular coin), where your metadata records are really
> describing the common features of a particular coin?
>
> If that's close, it seems like the richer metadata is really a sort of
> definition of the skos:Concept, so maybe skos:definition would do the
> trick? Something like this:
>
> ex:wheatPenny a skos:Concept ;
>skos:prefLabel "Wheat Penny" ;
>skos:definition "Your richer, non RDF metadata document describing the
> front and back, years minted, etc."
>
> In XML that might be like:
>
>  about="http://example.org/**wheatPenny<http://example.org/wheatPenny>
> ">
>  Wheat Penny
>  
> Your richer, non RDF metadata document describing the front and back,
> years minted, etc.
>  
>  
>
>
> It might raise an eyebrow to have, instead of a literal value for
> skos:definition, another set of structured, non RDF metadata. Better in
> that case to go with a document reference, and make your richer metadata a
> standalone document with its own URI:
>
> ex:wheatPenny skos:definition ex:wheatPennyDefinition**.xml
>
>  about="http://example.org/**wheatPenny<http://example.org/wheatPenny>
> ">
>  resource="http://example.org/**wheatPenny.xml<http://example.org/wheatPenny.xml>"
> />
> 
>
> I'm looking at the Documentation as a Document Reference section in SKOS
> Primer : 
> http://www.w3.org/TR/2009/**NOTE-skos-primer-20090818/<http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/>
>
> Again, if I'm following, that might be the closest approach.
>
> Hope that helps,
> Patrick
>
>
>
> On 02/11/2012 09:53 PM, Ethan Gruber wrote:
>
>> Hi Patrick,
>>
>> The richer metadata model is an ontology for describing coins.  It is more
>> complex than, say, VRA Core or MODS, but not as hierarchically complicated
>> as an EAD finding aid.  I'd like to link a skos:Concept to one of these
>> related metadata records.  It doesn't matter if I use  skos, owl, etc. to
>> describe this relationship, so long as it is a semantically appropriate
>> choice.
>>
>> Ethan
>>
>> On Sat, Feb 11, 2012 at 2:32 PM, Patrick Murray-John<
>> patrickmjc...@gmail.com>  wrote:
>>
>>  Ethan,
>>>
>>> Maybe I'm being daft in missing it, but could I ask about more details in
>>> the richer metadata model? My hunch is that, depending on the details of
>>> the information you want to bring in, there might be more precise
>>> alternatives to what's in SKOS. Are you aiming to have a link between a
>>> skos:Concept and texts/documents related to that concept?
>>>
>>> Patrick
>>>
>>>
>>> On 02/11/2012 03:14 PM, Ethan Gruber wrote:
>>>
>>>  Hi Ross,
>>>>
>>>> Thanks for the input.  My main objective is to make the richer metadata
>>>> available one way or another to people using our web services.  Do you
>>>> think it makes more sense to link to a URI of the richer metadata
>>>> document
>>>> as skos:related (or similar)?  I've seen two uses for skos:related--one
>>>> to
>>>> point to related skos:concepts, the other to point to web resources
>>>> associated with that concept, e.g., a wikipedia article.  I have a
>>>> feeling
>>>> the latter is incorrect, at least according to the documentation I've
>>>> read
>>>> on the w3c.  For what it's worth, VIAF uses owl:sameAs/@rdf:resource to
>>>> point to dbpedia and other web resources.
>>>>
>>>> Thanks,
>>>> Ethan
>>>>
>>>> On Sat, Feb 11, 2012 at 1

Re: [CODE4LIB] RDF advice

2012-02-11 Thread Ethan Gruber

Hi Patrick,

The richer metadata model is an ontology for describing coins.  It is more
complex than, say, VRA Core or MODS, but not as hierarchically complicated
as an EAD finding aid.  I'd like to link a skos:Concept to one of these
related metadata records.  It doesn't matter if I use  skos, owl, etc. to
describe this relationship, so long as it is a semantically appropriate
choice.

Ethan

On Sat, Feb 11, 2012 at 2:32 PM, Patrick Murray-John <
patrickmjc...@gmail.com> wrote:

> Ethan,
>
> Maybe I'm being daft in missing it, but could I ask about more details in
> the richer metadata model? My hunch is that, depending on the details of
> the information you want to bring in, there might be more precise
> alternatives to what's in SKOS. Are you aiming to have a link between a
> skos:Concept and texts/documents related to that concept?
>
> Patrick
>
>
> On 02/11/2012 03:14 PM, Ethan Gruber wrote:
>
>> Hi Ross,
>>
>> Thanks for the input.  My main objective is to make the richer metadata
>> available one way or another to people using our web services.  Do you
>> think it makes more sense to link to a URI of the richer metadata document
>> as skos:related (or similar)?  I've seen two uses for skos:related--one to
>> point to related skos:concepts, the other to point to web resources
>> associated with that concept, e.g., a wikipedia article.  I have a feeling
>> the latter is incorrect, at least according to the documentation I've read
>> on the w3c.  For what it's worth, VIAF uses owl:sameAs/@rdf:resource to
>> point to dbpedia and other web resources.
>>
>> Thanks,
>> Ethan
>>
>> On Sat, Feb 11, 2012 at 12:21 PM, Ross Singer
>>  wrote:
>>
>>  On Fri, Feb 10, 2012 at 11:51 PM, Ethan Gruber
>>>  wrote:
>>>
>>>> Hi Ross,
>>>>
>>>> No, the richer ontology is not an RDF vocabulary, but it adheres to
>>>>
>>> linked
>>>
>>>> data concepts.
>>>>
>>> Hmm, ok.  That doesn't necessarily mean it will work in RDF.
>>>
>>>> I'm looking to do something like this example of embedding mods in rdf:
>>>>
>>>>  http://www.daisy.org/zw/ZedAI_**Meta_Data_-_MODS_**
>>> Recommendation#RDF.2FXML_2<http://www.daisy.org/zw/ZedAI_Meta_Data_-_MODS_Recommendation#RDF.2FXML_2>
>>> Yeah, I'll be honest, that looks terrible to me.  This looks, to me,
>>> like kind of a misunderstanding of RDF and RDF/XML.
>>>
>>> Regardless, this would make useless RDF (see below).  One of the hard
>>> things to understand about RDF, especially when you're coming at it
>>> from XML (and, by association, RDF/XML) is that RDF isn't
>>> hierarchical, it's a graph.  This is one of the reasons that the XML
>>> serialization is so awkward: it looks something familiar XML people,
>>> but it doesn't work well with their tools (XPath, for example) despite
>>> the fact that it, you know, should.  It's equally frustrating for RDF
>>> people because it's really verbose and its syntax can come in a
>>> million variations (more on that later in the email) making it
>>> excruciatingly hard to parse.
>>>
>>>  These semantic ontologies are so flexible, it seems like I *can* do
>>>> anything, so I'm left wondering what I *should* do--what makes the most
>>>> sense, semantically.  Is it possible to nest rdf:Description into the
>>>> skos:Concept of my previous example, and then place.more
>>>> sophistated model..  into rdf:Description (or
>>>>
>>> alternatively,
>>>
>>>> set rdf:Description/@rdf:resource to the URI of the web-accessible XML
>>>>
>>> file?
>>>
>>>> Most RDF examples I've looked at online either have skos:Concept or
>>>> rdf:Description, not both, either at the same context in rdf:RDF or one
>>>> nested inside the other.
>>>>
>>>>  So, this is a little tough to explain via email, I think.  This is
>>> what I was referring to earlier about the myriad ways to render RDF in
>>> XML.
>>>
>>> In short, using:
>>> http://example.org/foo"**>
>>>  Something
>>>  ...
>>> 
>>>
>>> is shorthand for:
>>>
>>> http://example.org/foo"**>
>>>  >> resource="http://www.w3.org/**2004/02/skos/core#Concept<http://www.w3.org/2004/02/skos/core#Conce

Re: [CODE4LIB] RDF advice

2012-02-11 Thread Ethan Gruber

Hi Ross,

Thanks for the input.  My main objective is to make the richer metadata
available one way or another to people using our web services.  Do you
think it makes more sense to link to a URI of the richer metadata document
as skos:related (or similar)?  I've seen two uses for skos:related--one to
point to related skos:concepts, the other to point to web resources
associated with that concept, e.g., a wikipedia article.  I have a feeling
the latter is incorrect, at least according to the documentation I've read
on the w3c.  For what it's worth, VIAF uses owl:sameAs/@rdf:resource to
point to dbpedia and other web resources.

Thanks,
Ethan

On Sat, Feb 11, 2012 at 12:21 PM, Ross Singer  wrote:

> On Fri, Feb 10, 2012 at 11:51 PM, Ethan Gruber  wrote:
> > Hi Ross,
> >
> > No, the richer ontology is not an RDF vocabulary, but it adheres to
> linked
> > data concepts.
>
> Hmm, ok.  That doesn't necessarily mean it will work in RDF.
> >
> > I'm looking to do something like this example of embedding mods in rdf:
> >
> http://www.daisy.org/zw/ZedAI_Meta_Data_-_MODS_Recommendation#RDF.2FXML_2
> >
> Yeah, I'll be honest, that looks terrible to me.  This looks, to me,
> like kind of a misunderstanding of RDF and RDF/XML.
>
> Regardless, this would make useless RDF (see below).  One of the hard
> things to understand about RDF, especially when you're coming at it
> from XML (and, by association, RDF/XML) is that RDF isn't
> hierarchical, it's a graph.  This is one of the reasons that the XML
> serialization is so awkward: it looks something familiar XML people,
> but it doesn't work well with their tools (XPath, for example) despite
> the fact that it, you know, should.  It's equally frustrating for RDF
> people because it's really verbose and its syntax can come in a
> million variations (more on that later in the email) making it
> excruciatingly hard to parse.
>
> > These semantic ontologies are so flexible, it seems like I *can* do
> > anything, so I'm left wondering what I *should* do--what makes the most
> > sense, semantically.  Is it possible to nest rdf:Description into the
> > skos:Concept of my previous example, and then place .more
> > sophistated model.. into rdf:Description (or
> alternatively,
> > set rdf:Description/@rdf:resource to the URI of the web-accessible XML
> file?
> >
> > Most RDF examples I've looked at online either have skos:Concept or
> > rdf:Description, not both, either at the same context in rdf:RDF or one
> > nested inside the other.
> >
> So, this is a little tough to explain via email, I think.  This is
> what I was referring to earlier about the myriad ways to render RDF in
> XML.
>
> In short, using:
> http://example.org/foo";>
>  Something
>  ...
> 
>
> is shorthand for:
>
> http://example.org/foo";>
>  http://www.w3.org/2004/02/skos/core#Concept"; />
>  Something
> 
>
> So, yeah, you use one or the other.
>
> That said, I'm not sure your ontology is really going to work well,
> you'll just have to try it.  One thing that would probably be useful
> would be to serialize out a document with your nuds vocabulary as
> rdf/xml and then use something like rapper (comes with the redland
> libraries) to convert it to something more RDF-friendly, like turtle,
> and see if it makes any sense.
>
> For example, your daisy example above:
>
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>xml:mods="http://www.daisy.org/RDF/MODS";>
>
>
>
>
>World Cultures and
> Geography
>
>
>
>Sarah Witham
> Bednarz
>
> mods:type="text">author
>
>
>
>
>Inés M.
> Miyares
>
> mods:type="text">author
>
>
>
>
>Mark C. Schug
>
> mods:type="text">author
>
>
>
>
>Charles S.
&g

Re: [CODE4LIB] RDF advice

2012-02-10 Thread Ethan Gruber

Hi Ross,

No, the richer ontology is not an RDF vocabulary, but it adheres to linked
data concepts.

I'm looking to do something like this example of embedding mods in rdf:
http://www.daisy.org/zw/ZedAI_Meta_Data_-_MODS_Recommendation#RDF.2FXML_2

These semantic ontologies are so flexible, it seems like I *can* do
anything, so I'm left wondering what I *should* do--what makes the most
sense, semantically.  Is it possible to nest rdf:Description into the
skos:Concept of my previous example, and then place .more
sophistated model.. into rdf:Description (or alternatively,
set rdf:Description/@rdf:resource to the URI of the web-accessible XML file?

Most RDF examples I've looked at online either have skos:Concept or
rdf:Description, not both, either at the same context in rdf:RDF or one
nested inside the other.

Thanks,
Ethan

On Fri, Feb 10, 2012 at 9:44 PM, Ross Singer  wrote:

> The whole advantage of RDF is that you can pull properties from different
> vocabularies (as long as they're not logically disjoint). So, assuming your
> richer ontology is some kind of RDF vocabulary, this exactly *what* you
> should be doing.
>
> -Ross.
>
> On Feb 10, 2012, at 4:31 PM, Ethan Gruber  wrote:
>
> > Hi all,
> >
> > I'm working on an RDF model for describing concepts.  I have skos:Concept
> > nested inside rdf:RDF.  Most documents will have little more than labels
> > and related links inside of skos:Concept.  However, for a certain type of
> > concept, we have XML documents with a more sophisticated ontology and
> > structure for describing the concept.  I could embed this metadata into
> the
> > RDF or reference it as an rdf:resource.  It doesn't matter much to me
> > either way, but I'm unsure of the semantically correct way to create this
> > model.
> >
> > Suppose I have:
> >
> > 
> > 
> > Label
> > .more sophistated model..
> > 
> > 
> >
> > Is it okay to have the more sophistated metadata model embedded in
> > skos:Concept alongside labels and related links?  Suppose I want to store
> > the more sophisticated metadata separately and reference it?  I'm not
> sure
> > what property adequately addresses this relation, semantically.
> >
> > Recommendations?
> >
> > Thanks,
> > Ethan
>

[CODE4LIB] RDF advice

2012-02-10 Thread Ethan Gruber

Hi all,

I'm working on an RDF model for describing concepts.  I have skos:Concept
nested inside rdf:RDF.  Most documents will have little more than labels
and related links inside of skos:Concept.  However, for a certain type of
concept, we have XML documents with a more sophisticated ontology and
structure for describing the concept.  I could embed this metadata into the
RDF or reference it as an rdf:resource.  It doesn't matter much to me
either way, but I'm unsure of the semantically correct way to create this
model.

Suppose I have:



 Label
.more sophistated model..
 


Is it okay to have the more sophistated metadata model embedded in
skos:Concept alongside labels and related links?  Suppose I want to store
the more sophisticated metadata separately and reference it?  I'm not sure
what property adequately addresses this relation, semantically.

Recommendations?

Thanks,
Ethan

Re: [CODE4LIB] Metadata

2012-02-10 Thread Ethan Gruber

An interface is only as useful as the metadata allows it to be, and the
metadata is only as useful as the interface built to take advantage of it.

Ethan

On Fri, Feb 10, 2012 at 4:10 PM, David Faler  wrote:

> I think the answer is make sure you are able to add new elements to the
> store later, and keep around your source data and plan to be able to
> reprocess it.  Something like what XC is doing.  That way, you get to be
> agile at the beginning and just deal with what you *know* is absolutely
> needed, and add more when you can make a business case for it.  Especially
> if you are looking to deal with MARC or ONIX data.
>
> On Fri, Feb 10, 2012 at 3:57 PM, Patrick Berry  wrote:
>
> > So, one question I forgot to toss out at the Ask Anything session is:
> >
> > When do you know you have enough metadata?
> >
> > "You'll know it when you have it," isn't the response I'm looking for.
>  So,
> > I'm sure you're wondering what the context for this question is, and
> > honestly there is none.  This is geared towards contentDM or DSpace or
> > Omeka or Millennium.  I've seen groups not plan enough for collecting
> data
> > and I've seen groups that are have been planning so long they forgot what
> > they were supposed to be collecting in the first place.
> >
> > So, I'll just throw that vague question out there and see who wants to
> take
> > a swing.
> >
> > Thanks,
> > Pat/@pberry
> >
>

1 2 >

1 - 100 of 194 matches

Mail list logo