Re: [CODE4LIB] Looking for two coders to help with discoverability of videos

2013-12-01 Thread Kelley McGrath
I wanted to follow up on my previous post with a couple points.

1. This is probably too late for anybody thinking about applying, but I thought 
there may be some general interest. I have put up some more detailed 
specifications about what I am hoping to do at 
http://pages.uoregon.edu/kelleym/miw/. Data extraction overview.doc is the 
general overview and the other files contain supporting documents.

2. I replied some time ago to Heather's offer below about her website that will 
connect researchers with volunteer software developers. I have to admit that 
looking for volunteer software developers had not really occurred to me. 
However, I do have additional things that I would like to do for which I 
currently have no funding so if you would be interested in volunteering in the 
future, let me know.

Kelley
kell...@uoregon.edu


On Tue, Nov 12, 2013 at 6:33 PM, Heather Claxton 
mailto:claxt...@gmail.com>> wrote:
Hi Kelley,

I might be able to help in your search.   I'm in the process of starting a
website that connects academic researchers with volunteer software
developers.  I'm looking for people to post programming projects on the
website once it's launched in late January.   I realize that may be a
little late for you, but perhaps the project you mentioned in your PS
("clustering based on title, name, date ect.") would be perfect?  The
one caveat is that the website is targeting software developers who wish to
volunteer.   Anyway, if you're interested in posting, please send me an
e-mail  at  sciencesolved2...@gmail.com
I would greatly appreciate it.
Oh and of course it would be free to post  :)  Best of luck in your
hiring process,

Heather Claxton-Douglas


On Mon, Nov 11, 2013 at 9:58 PM, Kelley McGrath 
mailto:kell...@uoregon.edu>> wrote:

> I have a small amount of money to work with and am looking for two people
> to help with extracting data from MARC records as described below. This is
> part of a larger project to develop a FRBR-based data store and discovery
> interface for moving images. Our previous work includes a consideration of
> the feasibility of the project from a cataloging perspective (
> http://www.olacinc.org/drupal/?q=node/27), a prototype end-user interface
> (https://blazing-sunset-24.heroku.com/,
> https://blazing-sunset-24.heroku.com/page/about) and a web form to
> crowdsource the parsing of movie credits (
> http://olac-annotator.org/#/about).
> Planned work period: six months beginning around the second week of
> December (I can be somewhat flexible on the dates if you want to wait and
> start after the New Year)
> Payment: flat sum of $2500 upon completion of the work
>
> Required skills and knowledge:
>
>   *   Familiarity with the MARC 21 bibliographic format
>   *   Familiarity with Natural Language Processing concepts (or
> willingness to learn)
>   *   Experience with Java, Python, and/or Ruby programming languages
>
> Description of work: Use language and text processing tools and provided
> strategies to write code to extract and normalize data in existing MARC
> bibliographic records for moving images. Refine code based on feedback from
> analysis of results obtained with a sample dataset.
>
> Data to be extracted:
> Tasks for Position 1:
> Titles (including the main title of the video, uniform titles, variant
> titles, series titles, television program titles and titles of contents)
> Authors and titles of related works on which an adaptation is based
> Duration
> Color
> Sound vs. silent
> Tasks for Position 2:
> Format (DVD, VHS, film, online, etc.)
> Original language
> Country of production
> Aspect ratio
> Flag for whether a record represents multiple works or not
> We have already done some work with dates, names and roles and have a
> framework to work in. I have the basic logic for the data extraction
> processes, but expect to need some iteration to refine these strategies.
>
> To apply please send me an email at kelleym@uoregon explaining why you
> are interested in this project, what relevant experience you would bring
> and any other reasons why I should hire you. If you have a preference for
> position 1 or 2, let me know (it's not necessary to have a preference). The
> deadline for applications is Monday, December 2, 2013. Let me know if you
> have any questions.
>
> Thank you for your consideration.
>
> Kelley
>
> PS In the near future, I will also be looking for someone to help with
> work clustering based on title, name, date and identifier data from MARC
> records. This will not involve any direct interaction with MARC.
>
>
> Kelley McGrath
> Metadata Management Librarian
> University of Oregon Libraries
> 541-346-8232
> kell...@uoregon.edu
>


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 11:12 PM, Simon Spero wrote:

> On Dec 1, 2013 6:42 PM, "Joe Hourcle"  wrote:
> 
>> So that you don't screw up web proxies, you have to specify the 'Vary'
> header to tell which parameters you consider significant so that it knows
> what is or isn't cacheable.
> 
> I believe that if a Vary isn't specified, and the content is not marked as
> non cachable,  a cache must assume Vary:*, but I might be misremembering

That would be horrible for caching proxies to assume that nothing's
cacheable unless it said it was.  (as typically only the really big
websites or those that have seen some obvious problems bother with
setting cache control headers.)

I haven't done any exhaustive tests in many years, but I was noticing
that proxies were starting to cache GET requests with query strings,
which bothered me -- it used to be that anything that was an obvious
CGI wasn't cached.  (I guess that enough sites use it, it has to make
the assumption that the sites aren't stateful, and that the parameters
in the URL are enough information for hashing)



>> (who has been managing web servers since HTTP/0.9, and gets annoyed when
> I have to explain to our security folks each year  why I don't reject
> pre-HTTP/1.1 requests or follow the rest of  the CIS benchmark
> recommendations that cause our web services to fail horribly)
> 
> Old school represent (0.9 could out perform 1.0 if the request headers were
> more than 1 MTU or the first line was sent in a separate packet with nagle
> enabled). [Accept was a major cause of header bloat].

Don't even get me started on header bloat ... 

My main complaint about HTTP/1.1 is that it requires clients to support
chunked encoding, and I've got to support a client that's got a buggy
implementation.  (and then my CGIs that serve 2GB tarballs start
failing, and it's calling a program that's not smart enough to look
for SIG_PIPE, so I end up with a dozen of 'em going all stupid and
sucking down CPU on one of my servers)

Most people don't have to support a community written HTTP client,
though.  (and the one alternative HTTP client in IDL doesn't let me
interactive  w/ the HTTP headers directly, so I can't put a wrapper
around it to extract the tarball's filename from the Content-Disposition
header)

-Joe

ps.  yep, still having writer's block on posters.


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Simon Spero
On Dec 1, 2013 6:42 PM, "Joe Hourcle"  wrote:

> So that you don't screw up web proxies, you have to specify the 'Vary'
header to tell which parameters you consider significant so that it knows
what is or isn't cacheable.

I believe that if a Vary isn't specified, and the content is not marked as
non cachable,  a cache must assume Vary:*, but I might be misremembering.

> (who has been managing web servers since HTTP/0.9, and gets annoyed when
I have to explain to our security folks each year  why I don't reject
pre-HTTP/1.1 requests or follow the rest of  the CIS benchmark
recommendations that cause our web services to fail horribly)

Old school represent (0.9 could out perform 1.0 if the request headers were
more than 1 MTU or the first line was sent in a separate packet with nagle
enabled). [Accept was a major cause of header bloat].


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 9:36 PM, Barnes, Hugh wrote:

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe 
> Hourcle
> 
>>> (They are on Wikipedia so they must be real.)
> 
>> Wikipedia was the first place you looked?  Not IETF or W3C?
>> No wonder people say libraries are doomed, if even people who work in 
>> libraries go straight to Wikipedia.
> 
> It was a humorous aside, regrettably lacking a smiley.

Yes, a smiley would have helped.

It also doesn't help that there used to be a website out there
named 'ScoopThis'.  They started as a wrestling parody site, but
my favorite part was their advice column from 'Dusty the Fat,
Bitter Cat'.

I bring this up because their slogan was "cuz if it’s on the net,
it’s got to be true" ... so I twitch a little whenever someone
says something similar to that phrase.

(unfortunately, the site's gone, and archive.org didn't cache
them, so you can't see the photoshopped pictures of Dusty
at Woodstock '99 or the Rock's cooking show.  They started up
a separate website for Dusty, but when they closed that one
down, they put up a parody of a porn site, so you probably
don't want to go looking for it)


> I think that comment would be better saved to pitch at folks who cite and 
> link to w3schools as if authoritative. Some of them are even in libraries.

Although I wish that w3schools would stop showing up so highly
in searches for javascript methods & css attributes, they
did have a time when they were some of the best tutorials out
there on web-related topics.  I don't know if I can claim that
to be true today, though.


> Your other comments were informative, though. Thank you :)

I try ... especially when I'm procrastinating on doing posters
that I need to have printed by Friday.

(but if anyone has any complaints about data.gov or other
federal data dissemination efforts, I'll be happy to work
them in)

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Barnes, Hugh
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe 
Hourcle

>> (They are on Wikipedia so they must be real.)

> Wikipedia was the first place you looked?  Not IETF or W3C?
> No wonder people say libraries are doomed, if even people who work in 
> libraries go straight to Wikipedia.

It was a humorous aside, regrettably lacking a smiley.

I think that comment would be better saved to pitch at folks who cite and link 
to w3schools as if authoritative. Some of them are even in libraries.

Your other comments were informative, though. Thank you :)

Cheers
Hugh



P Please consider the environment before you print this email.
"The contents of this e-mail (including any attachments) may be confidential 
and/or subject to copyright. Any unauthorised use, distribution, or copying of 
the contents is expressly prohibited. If you have received this e-mail in 
error, please advise the sender by return e-mail or telephone and then delete 
this e-mail together with all attachments from your system."


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 7:57 PM, Barnes, Hugh wrote:

> +1 to all of Richard's points here. Making something easier for you to 
> develop is no justification for making it harder to consume or deviating from 
> well supported standards.
> 
> [Robert]
>> You can't 
>> just put a file in the file system, unlike with separate URIs for 
>> distinct representations where it just works, instead you need server 
>> side processing.
> 
> If we introduce languages into the negotiation, this won't scale.

It depends on what you qualify as 'scaling'.  You can configure
Apache and some other servers so that you pre-generate files such
as :

index.en.html
index.de.html
index.es.html
index.fr.html

... It's even the default for some distributions.

Then, depending on what the Accept-Language header is sent,
the server returns the appropriate response.  The only issue
is that the server assumes that the 'quality' of all of the
translations are equivalent.

You know that 'q=0.9' stuff?  There's actually a scale in
RFC 2295, that equates the different qualities to how much
content is lost in that particular version:

  Servers should use the following table a guide when assigning source
  quality values:

 1.000  perfect representation
 0.900  threshold of noticeable loss of quality
 0.800  noticeable, but acceptable quality reduction
 0.500  barely acceptable quality
 0.300  severely degraded quality
 0.000  completely degraded quality





> [Robert]
>> This also makes it much harder to cache the 
>> responses, as the cache needs to determine whether or not the 
>> representation has changed -- the cache also needs to parse the 
>> headers rather than just comparing URI and content.  
> 
> Don't know caches intimately, but I don't see why that's algorithmically 
> difficult. Just look at the Content-type of the response. Is it harder for 
> caches to examine headers than content or URI? (That's an earnest, perhaps 
> naïve, question.)

See my earlier response.  The problem is without a 'Vary' header or
other cache-control headers, caches may assume that a URL is a fixed
resource.

If it were to assume that was static, then it wouldn't matter what
was sent for the Accept, Accept-Encoding or Accept-Language ... and
so the first request proxied gets cached, and then subsequent
requests get the cached copy, even if that's not what the server
would have sent.


> If we are talking about caching on the client here (not caching proxies), I 
> would think in most cases requests are issued with the same Accept-* headers, 
> so caching will work as expected anyway.

I assume he's talking about caching proxies, where it's a real
problem.


> [Robert]
>> Link headers 
>> can be added with a simple apache configuration rule, and as they're 
>> static are easy to cache. So the server side is easy, and the client side is 
>> trivial.
> 
> Hadn't heard of these. (They are on Wikipedia so they must be real.) What do 
> they offer over HTML  elements populated from the Dublin Core Element 
> Set?

Wikipedia was the first place you looked?  Not IETF or W3C?
No wonder people say libraries are doomed, if even people who work
in libraries go straight to Wikipedia.


...


oh, and I should follow up to my posting from earlier tonight --
upon re-reading the HTTP/1.1 spec, it seems that there *is* a way to
specify the authoritative URL returned without an HTTP round-trip,
Content-Location :

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14

Of course, it doesn't look like my web browser does anything with
it:

http://www.w3.org/Protocols/rfc2616/rfc2616
http://www.w3.org/Protocols/rfc2616/rfc2616.html
http://www.w3.org/Protocols/rfc2616/rfc2616.txt

... so you'd still have to use Location: if you wanted it to 
show up to the general public.

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread LeVan,Ralph
Returning a content location header does not require a redirect.  You can 
return the negotiated content with the first response than still tell the 
client how it could have asked for that same content without negotiation.  
That's what the content location header means in the absence of a redirect 
status code.

Ralph

From: Code for Libraries  on behalf of Joe Hourcle 

Sent: Sunday, December 01, 2013 6:39 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: The lie of the API

On Dec 1, 2013, at 3:51 PM, LeVan,Ralph wrote:

> I'm confused about the supposed distinction between content negotiation and 
> explicit content request in a URL.  The reason I'm confused is that the 
> response to content negotiation is supposed to be a content location header 
> with a URL that is guaranteed to return the negotiated content.  In other 
> words, there *must* be a form of the URL that bypasses content negotiation.  
> If you can do content negotiation, then you should have a URL form that 
> doesn't require content negotiation.

There are three types of content negotiation discussed in HTTP/1.1.  The
one that most gets used is 'transparent negotiation' which results in
there being different content served under a single URL.

Transparent negotiation schemes do *not* redirect to a new URL to allow
the cache or browser to identify the specific content returned.  (this
would require an extra round trip, as you'd have to send a Location:
header to redirect, then have the browser request the new page)

So that you don't screw up web proxies, you have to specify the 'Vary'
header to tell which parameters you consider significant so that it
knows what is or isn't cacheable.  So if you might serve different
content based on the Accept and Accept-Encoding would return:

Vary: Accept, Accept-Encoding

(Including 'User Agent' is problematic because of some browsers
that pack in every module + the version in there, making there be so
many permutations that many proxies will refuse to cache it)

-Joe

(who has been managing web servers since HTTP/0.9, and gets
annoyed when I have to explain to our security folks each year
why I don't reject pre-HTTP/1.1 requests or follow the rest of
the CIS benchmark recommendations that cause our web services to
fail horribly)


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Barnes, Hugh
+1 to all of Richard's points here. Making something easier for you to develop 
is no justification for making it harder to consume or deviating from well 
supported standards.

[Robert]
>  You can't 
> just put a file in the file system, unlike with separate URIs for 
> distinct representations where it just works, instead you need server 
> side processing.

If we introduce languages into the negotiation, this won't scale.

[Robert]
> This also makes it much harder to cache the 
> responses, as the cache needs to determine whether or not the 
> representation has changed -- the cache also needs to parse the 
> headers rather than just comparing URI and content.  

Don't know caches intimately, but I don't see why that's algorithmically 
difficult. Just look at the Content-type of the response. Is it harder for 
caches to examine headers than content or URI? (That's an earnest, perhaps 
naïve, question.)

If we are talking about caching on the client here (not caching proxies), I 
would think in most cases requests are issued with the same Accept-* headers, 
so caching will work as expected anyway.

[Robert]
> Link headers 
> can be added with a simple apache configuration rule, and as they're 
> static are easy to cache. So the server side is easy, and the client side is 
> trivial.

Hadn't heard of these. (They are on Wikipedia so they must be real.) What do 
they offer over HTML  elements populated from the Dublin Core Element Set?

---

My ideal setup would be to maintain a canonical URL that always serves the 
clients' flavour of representation (format/language), which could vary, but 
points to other representations (and versions for that matter) at separate URLs 
through a mechanism like HTML link elements.

My whatever it's worth . great topic, though, thanks Robert :)

Cheers

Hugh Barnes
Digital Access Coordinator
Library, Teaching and Learning
Lincoln University
Christchurch
New Zealand
p +64 3 423 0357

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Richard 
Wallis
Sent: Monday, 2 December 2013 12:26 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] The lie of the API

"It's harder to implement Content Negotiation than your own API, because you 
get to define your own API whereas you have to follow someone else's rules"

Don't wish your implementation problems on the consumers of your data.
There are [you would hope] far more of them than of you ;-)

Content-negotiation is an already established mechanism - why invent a new, and 
different, one just for *your* data?

Put your self in the place of your consumer having to get their head around yet 
another site specific API pattern.

As to discovering then using the (currently implemented) URI returned from a 
content-negotiated call  - The standard http libraries take care of that, like 
any other http redirects (301,303, etc) plus you are protected from any future 
backend server implementation changes.


~Richard


On 1 December 2013 20:51, LeVan,Ralph  wrote:

> I'm confused about the supposed distinction between content 
> negotiation and explicit content request in a URL.  The reason I'm 
> confused is that the response to content negotiation is supposed to be 
> a content location header with a URL that is guaranteed to return the 
> negotiated content.  In other words, there *must* be a form of the URL that 
> bypasses content negotiation.
>  If you can do content negotiation, then you should have a URL form 
> that doesn't require content negotiation.
>
> Ralph
> 
> From: Code for Libraries  on behalf of 
> Robert Sanderson 
> Sent: Friday, November 29, 2013 2:44 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: The lie of the API
>
> (posted in the comments on the blog and reposted here for further 
> discussion, if interest)
>
>
> While I couldn't agree more with the post's starting point -- URIs 
> identify
> (concepts) and use HTTP as your API -- I couldn't disagree more with 
> the "use content negotiation" conclusion.
>
> I'm with Dan Cohen in his comment regarding using different URIs for 
> different representations for several reasons below.
>
> It's harder to implement Content Negotiation than your own API, 
> because you get to define your own API whereas you have to follow 
> someone else's rules when you implement conneg.  You can't get your 
> own API wrong.  I agree with Ruben that HTTP is better than rolling 
> your own proprietary API, we disagree that conneg is the correct 
> solution.  The choice is between conneg or regular HTTP, not conneg or a 
> proprietary API.
>
> Secondly, you need to look at the HTTP headers and parse quite a 
> complex structure to determine what is being requested.  You can't 
> just put a file in the file system, unlike with separate URIs for 
> distinct representations where it just works, instead you need server 
> side processing.  This also makes it much harder to cache the 
> responses, a

Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 3:51 PM, LeVan,Ralph wrote:

> I'm confused about the supposed distinction between content negotiation and 
> explicit content request in a URL.  The reason I'm confused is that the 
> response to content negotiation is supposed to be a content location header 
> with a URL that is guaranteed to return the negotiated content.  In other 
> words, there *must* be a form of the URL that bypasses content negotiation.  
> If you can do content negotiation, then you should have a URL form that 
> doesn't require content negotiation.

There are three types of content negotiation discussed in HTTP/1.1.  The
one that most gets used is 'transparent negotiation' which results in
there being different content served under a single URL.

Transparent negotiation schemes do *not* redirect to a new URL to allow
the cache or browser to identify the specific content returned.  (this
would require an extra round trip, as you'd have to send a Location:
header to redirect, then have the browser request the new page)

So that you don't screw up web proxies, you have to specify the 'Vary'
header to tell which parameters you consider significant so that it
knows what is or isn't cacheable.  So if you might serve different
content based on the Accept and Accept-Encoding would return:

Vary: Accept, Accept-Encoding

(Including 'User Agent' is problematic because of some browsers
that pack in every module + the version in there, making there be so
many permutations that many proxies will refuse to cache it)

-Joe

(who has been managing web servers since HTTP/0.9, and gets 
annoyed when I have to explain to our security folks each year
why I don't reject pre-HTTP/1.1 requests or follow the rest of
the CIS benchmark recommendations that cause our web services to
fail horribly)


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Richard Wallis
"It's harder to implement Content Negotiation than your own API, because you
get to define your own API whereas you have to follow someone else's rules"

Don't wish your implementation problems on the consumers of your data.
There are [you would hope] far more of them than of you ;-)

Content-negotiation is an already established mechanism - why invent a new,
and different, one just for *your* data?

Put your self in the place of your consumer having to get their head around
yet another site specific API pattern.

As to discovering then using the (currently implemented) URI returned from
a content-negotiated call  - The standard http libraries take care of that,
like any other http redirects (301,303, etc) plus you are protected from
any future backend server implementation changes.


~Richard


On 1 December 2013 20:51, LeVan,Ralph  wrote:

> I'm confused about the supposed distinction between content negotiation
> and explicit content request in a URL.  The reason I'm confused is that the
> response to content negotiation is supposed to be a content location header
> with a URL that is guaranteed to return the negotiated content.  In other
> words, there *must* be a form of the URL that bypasses content negotiation.
>  If you can do content negotiation, then you should have a URL form that
> doesn't require content negotiation.
>
> Ralph
> 
> From: Code for Libraries  on behalf of Robert
> Sanderson 
> Sent: Friday, November 29, 2013 2:44 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: The lie of the API
>
> (posted in the comments on the blog and reposted here for further
> discussion, if interest)
>
>
> While I couldn't agree more with the post's starting point -- URIs identify
> (concepts) and use HTTP as your API -- I couldn't disagree more with the
> "use content negotiation" conclusion.
>
> I'm with Dan Cohen in his comment regarding using different URIs for
> different representations for several reasons below.
>
> It's harder to implement Content Negotiation than your own API, because you
> get to define your own API whereas you have to follow someone else's rules
> when you implement conneg.  You can't get your own API wrong.  I agree with
> Ruben that HTTP is better than rolling your own proprietary API, we
> disagree that conneg is the correct solution.  The choice is between conneg
> or regular HTTP, not conneg or a proprietary API.
>
> Secondly, you need to look at the HTTP headers and parse quite a complex
> structure to determine what is being requested.  You can't just put a file
> in the file system, unlike with separate URIs for distinct representations
> where it just works, instead you need server side processing.  This also
> makes it much harder to cache the responses, as the cache needs to
> determine whether or not the representation has changed -- the cache also
> needs to parse the headers rather than just comparing URI and content.  For
> large scale systems like DPLA and Europeana, caching is essential for
> quality of service.
>
> How do you find our which formats are supported by conneg? By reading the
> documentation. Which could just say "add .json on the end". The Vary header
> tells you that negotiation in the format dimension is possible, just not
> what to do to actually get anything back. There isn't a way to find this
> out from HTTP automatically,so now you need to read both the site's docs
> AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
> OAI-PMH's ListMetadataFormats and SRU's Explain response.
>
> Instead you can have a separate URI for each representation and link them
> with Link headers, or just a simple rule like add '.json' on the end. No
> need for complicated content negotiation at all.  Link headers can be added
> with a simple apache configuration rule, and as they're static are easy to
> cache. So the server side is easy, and the client side is trivial.
>  Compared to being difficult at both ends with content negotiation.
>
> It can be useful to make statements about the different representations,
> and especially if you need to annotate the structure or content.  Or share
> it -- you can't email someone a link that includes the right Accept headers
> to send -- as in the post, you need to send them a command line like curl
> with -H.
>
> An experiment for fans of content negotiation: Have both .json and 302
> style conneg from your original URI to that .json file. Advertise both. See
> how many people do the conneg. If it's non-zero, I'll be extremely
> surprised.
>
> And a challenge: Even with libraries there's still complexity to figuring
> out how and what to serve. Find me sites that correctly implement * based
> fallbacks. Or even process q values. I'll bet I can find 10 that do content
> negotiation wrong, for every 1 that does it correctly.  I'll start:
> dx.doi.org touts its content negotiation for metadata, yet doesn't
> implement q values or *s. You have to go to the documentation

[CODE4LIB] User Registration and Authentication for a Tomcat webapp?

2013-12-01 Thread LeVan,Ralph
OCLC Research is building an ILL Cost Calculator.  We'll be asking institutions 
to enter information about their ILL practices and costs and then supporting 
mechanisms for generating reports based on that data.  (Dennis Massie is 
leading this work in Research and he really should have a page about this 
project that I could point your at.)

I'm writing the part that collects the information.  I need a light-weight 
framework that will let users register themselves and then subsequently 
authenticate themselves while engaged in an iterative process of entering the 
necessary data for the calculator.

I'm looking for suggestions for that framework.  I'm hoping for something in 
java that can be integrated into a tomcat webapp environment, but it wouldn't 
hurt me to stretch a little if there's something else out there you think I 
should be trying.

Thanks!

Ralph

Ralph LeVan
Sr. Research Scientist
OCLC Research


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread LeVan,Ralph
I'm confused about the supposed distinction between content negotiation and 
explicit content request in a URL.  The reason I'm confused is that the 
response to content negotiation is supposed to be a content location header 
with a URL that is guaranteed to return the negotiated content.  In other 
words, there *must* be a form of the URL that bypasses content negotiation.  If 
you can do content negotiation, then you should have a URL form that doesn't 
require content negotiation.

Ralph

From: Code for Libraries  on behalf of Robert 
Sanderson 
Sent: Friday, November 29, 2013 2:44 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: The lie of the API

(posted in the comments on the blog and reposted here for further
discussion, if interest)


While I couldn't agree more with the post's starting point -- URIs identify
(concepts) and use HTTP as your API -- I couldn't disagree more with the
"use content negotiation" conclusion.

I'm with Dan Cohen in his comment regarding using different URIs for
different representations for several reasons below.

It's harder to implement Content Negotiation than your own API, because you
get to define your own API whereas you have to follow someone else's rules
when you implement conneg.  You can't get your own API wrong.  I agree with
Ruben that HTTP is better than rolling your own proprietary API, we
disagree that conneg is the correct solution.  The choice is between conneg
or regular HTTP, not conneg or a proprietary API.

Secondly, you need to look at the HTTP headers and parse quite a complex
structure to determine what is being requested.  You can't just put a file
in the file system, unlike with separate URIs for distinct representations
where it just works, instead you need server side processing.  This also
makes it much harder to cache the responses, as the cache needs to
determine whether or not the representation has changed -- the cache also
needs to parse the headers rather than just comparing URI and content.  For
large scale systems like DPLA and Europeana, caching is essential for
quality of service.

How do you find our which formats are supported by conneg? By reading the
documentation. Which could just say "add .json on the end". The Vary header
tells you that negotiation in the format dimension is possible, just not
what to do to actually get anything back. There isn't a way to find this
out from HTTP automatically,so now you need to read both the site's docs
AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
OAI-PMH's ListMetadataFormats and SRU's Explain response.

Instead you can have a separate URI for each representation and link them
with Link headers, or just a simple rule like add '.json' on the end. No
need for complicated content negotiation at all.  Link headers can be added
with a simple apache configuration rule, and as they're static are easy to
cache. So the server side is easy, and the client side is trivial.
 Compared to being difficult at both ends with content negotiation.

It can be useful to make statements about the different representations,
and especially if you need to annotate the structure or content.  Or share
it -- you can't email someone a link that includes the right Accept headers
to send -- as in the post, you need to send them a command line like curl
with -H.

An experiment for fans of content negotiation: Have both .json and 302
style conneg from your original URI to that .json file. Advertise both. See
how many people do the conneg. If it's non-zero, I'll be extremely
surprised.

And a challenge: Even with libraries there's still complexity to figuring
out how and what to serve. Find me sites that correctly implement * based
fallbacks. Or even process q values. I'll bet I can find 10 that do content
negotiation wrong, for every 1 that does it correctly.  I'll start:
dx.doi.org touts its content negotiation for metadata, yet doesn't
implement q values or *s. You have to go to the documentation to figure out
what Accept headers it will do string equality tests against.

Rob



On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland 
wrote:
>
> Dear all,
>
> I guess some of you will be interested in the blogpost of my colleague
and co-author Ruben regarding the misunderstandings on the use and abuse of
APIs in a digital libraries context, including a description of both good
and bad practices from Europeana, DPLA and the Cooper Hewitt museum:
>
> http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/
>
> Kind regards,
>
> Seth van Hooland
> Président du Master en Sciences et Technologies de l'Information et de la
Communication (MaSTIC)
> Université Libre de Bruxelles
> Av. F.D. Roosevelt, 50 CP 123  | 1050 Bruxelles
> http://homepages.ulb.ac.be/~svhoolan/
> http://twitter.com/#!/sethvanhooland
> http://mastic.ulb.ac.be
> 0032 2 650 4765
> Office: DC11.102